| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
| |
Hopefully this will make it easier to vary the combine depth threshold per-target.
llvm-svn: 314337
|
| |
|
|
|
|
|
|
|
|
| |
Zeroable APInt
We already have zeroable bits in an APInt. We might as well use that instead of checking for an all zero BUILD_VECTOR.
Differential Revision: https://reviews.llvm.org/D37950
llvm-svn: 314332
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
instead of using a smaller add and inserting.
In some cases the result psadbw is smaller than the type of the add that started the match. Currently in these cases we are using a smaller add and inserting the result.
If we instead combine the psadbw with zeros and use the full size add we can take advantage of implicit zeroing we get if we emit a narrower move before the add.
In a future patch, I want to make isel aware that the psadbw itself already zeroed the upper bits and remove the move entirely.
Differential Revision: https://reviews.llvm.org/D37453
llvm-svn: 314331
|
| |
|
|
|
|
|
|
|
|
| |
Reviewers: mcrosier
Subscribers: aemerson, rengolin, javed.absar, kristof.beyls
Differential Revision: https://reviews.llvm.org/D38301
llvm-svn: 314319
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was intended to be no-functional-change, but it's not - there's a test diff.
So I thought I should stop here and post it as-is to see if this looks like what was expected
based on the discussion in PR34603:
https://bugs.llvm.org/show_bug.cgi?id=34603
Notes:
1. The test improvement occurs because the existing 'LateSimplifyCFG' marker is not carried
through the recursive calls to 'SimplifyCFG()->SimplifyCFGOpt().run()->SimplifyCFG()'.
The parameter isn't passed down, so we pick up the default value from the function signature
after the first level. I assumed that was a bug, so I've passed 'Options' down in all of the
'SimplifyCFG' calls.
2. I split 'LateSimplifyCFG' into 2 bits: ConvertSwitchToLookupTable and KeepCanonicalLoops.
This would theoretically allow us to differentiate the transforms controlled by those params
independently.
3. We could stash the optional AssumptionCache pointer and 'LoopHeaders' pointer in the struct too.
I just stopped here to minimize the diffs.
4. Similarly, I stopped short of messing with the pass manager layer. I have another question that
could wait for the follow-up: why is the new pass manager creating the pass with LateSimplifyCFG
set to true no matter where in the pipeline it's creating SimplifyCFG passes?
// Create an early function pass manager to cleanup the output of the
// frontend.
EarlyFPM.addPass(SimplifyCFGPass());
-->
/// \brief Construct a pass with the default thresholds
/// and switch optimizations.
SimplifyCFGPass::SimplifyCFGPass()
: BonusInstThreshold(UserBonusInstThreshold),
LateSimplifyCFG(true) {} <-- switches get converted to lookup tables and loops may not be in canonical form
If this is unintended, then it's possible that the current behavior of dropping the 'LateSimplifyCFG'
setting via recursion was masking this bug.
Differential Revision: https://reviews.llvm.org/D38138
llvm-svn: 314308
|
| |
|
|
|
|
|
|
|
| |
This patch makes analyzeBranch eliminate unconditional branch to the next instruction.
After basic blocks are re-organized by optimizers, such as machine block placement, a BB may end with an unconditional branch to the next (fallthrough) BB. This patch removes such redundant branch instruction.
Differential Revision: https://reviews.llvm.org/D37730
llvm-svn: 314297
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D37473
llvm-svn: 314295
|
| |
|
|
|
|
|
|
| |
results.
As commented on D37849 and rL313547, AVX1 targets were missing a chance to use vmovmskpd for v4f64/v4i64 results for bool vector bitcasts
llvm-svn: 314293
|
| |
|
|
|
|
|
|
|
|
| |
I implemented isTruncateFree in rL313533, this patch fixes the logic
to match my comment, as the previous logic was too general. Now the
only truncates that are free are i64 -> i32.
Differential Revision: https://reviews.llvm.org/D38234
llvm-svn: 314280
|
| |
|
|
|
|
|
|
|
| |
This is necessary, but not sufficient, for having working SJLJ exception
handling on x86_64.
Differential Revision: https://reviews.llvm.org/D38254
llvm-svn: 314277
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The callsite value is already stored indexed from 0 in
the _Unwind_Context struct. When accessed via the functions
_Unwind_GetIP and _Unwind_SetIP, the value is indexed from 1,
but those functions handle the offseting. When reading directly
from the struct here, we shouldn't subtract 1.
This matches the code generated by the ARM target, where SJLJ
exception handling is used by default on iOS.
This makes clang-built object files for 32 bit x86 mingw work when
linked with libgcc/libstdc++.
Differential Revision: https://reviews.llvm.org/D38251
llvm-svn: 314276
|
| |
|
|
|
|
| |
build vectors.
llvm-svn: 314274
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
In rare cases, loads that don't get prefetched that were marked as
strided loads could cause a crash if they occurred in a loop with other
colliding loads.
Reviewers: mcrosier
Subscribers: aemerson, rengolin, javed.absar, kristof.beyls
Differential Revision: https://reviews.llvm.org/D38261
llvm-svn: 314252
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
correct some opcode tag computations.
Summary:
This addresses a correctness bug for LD[1234]*_POST opcodes that have
the prefetcher fix applied to them: the base register was not being
written back from the temp after being incremented, so it would appear
to never be incremented.
Also, fix some opcode tag computations based on some updated HW details
to get better tag avoidance and thus better prefetcher performance.
Reviewers: mcrosier
Subscribers: aemerson, rengolin, javed.absar, kristof.beyls
Differential Revision: https://reviews.llvm.org/D38256
llvm-svn: 314251
|
| |
|
|
| |
llvm-svn: 314250
|
| |
|
|
|
|
|
|
| |
postRA pseudos like the NOREX version of TEST.""
The late MOV8rr_NOREX that caused the crash has been removed.
llvm-svn: 314249
|
| |
|
|
|
|
| |
This hook is called after register allocation with two physical registers. We don't need a separate instruction at that time to force register class constraints. I left in the assert though. We also have a fatal error in X86MCCodeEmitter if we ever encode an H-reg and a REX prefix.
llvm-svn: 314248
|
| |
|
|
| |
llvm-svn: 314247
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
instructions
In the past while, I've committed a number of patches in the PowerPC back end
aimed at eliminating comparison instructions. However, this causes some failures
in proprietary source and these issues are not observed in SPEC or any open
source packages I've been able to run.
As a result, I'm pulling the entire series and will refactor it to:
- Have a single entry point for easy control
- Have fine-grained control over which patterns we transform
A side-effect of this is that test cases for these patches (and modified by
them) are XFAIL-ed. This is a temporary measure as it is counter-productive
to remove/modify these test cases and then have to modify them again when
the refactored patch is recommitted.
The failure will be investigated in parallel to the refactoring effort and
the recommit will either have a fix for it or will leave this transformation
off by default until the problem is resolved.
llvm-svn: 314244
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
X86InterleavedAccess (VF{8|16|32} stride 3)
This patch expands the support of lowerInterleavedStore to {8|16|32}x8i stride 3.
LLVM creates suboptimal shuffle code-gen for AVX2. In overall, this patch is a specific fix for the pattern (Strid=3 VF={8|16|32}) .
This patch is part two of two patches and it covers the store (interlevaed) side.
The patch goal is to optimize the following sequence:
a0 a1 a2 a3 a4 a5 a6 a7
b0 b1 b2 b3 b4 b5 b6 b7
c0 c1 c2 c3 c4 c5 c6 c7
into
a0 b0 c0 a1 b1 c1 a2 b2
c2 a3 b3 c3 a4 b4 c4 a5
b5 c5 a6 b6 c6 a7 b7 c7
Reviewers:
zvi
guyblank
dorit
Ayal
Differential Revision: https://reviews.llvm.org/D37117
Change-Id: I56ced8bcbea809a37654060771911ade20246ccc
llvm-svn: 314234
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D38191
llvm-svn: 314223
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This patch extends the v8i32/v4i32 custom lowering to support v16i32
Reviewers: zvi, RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D38274
llvm-svn: 314221
|
| |
|
|
| |
llvm-svn: 314216
|
| |
|
|
|
|
|
| |
Make sure that "initializeSubtargetDependencies" sets all members that
InstrInfo and the like may depend on.
llvm-svn: 314214
|
| |
|
|
|
|
|
|
| |
The XOP rotations act as ROTL with +ve values and ROTR with -ve values, which means that we can treat them all as ROTL with unsigned modulo. We already check that we're only trying to lower as ROTL for XOP rotations.
Differential Revision: https://reviews.llvm.org/D37949
llvm-svn: 314207
|
| |
|
|
|
|
|
|
|
|
| |
https://bugs.llvm.org//show_bug.cgi?id=29061
Don't try referencing REX-needed regs when not on 64bit mode
Aligns to GCC
Differetial Revision: https://reviews.llvm.org/D37801
llvm-svn: 314203
|
| |
|
|
|
|
|
|
| |
pseudos like the NOREX version of TEST."
Makes llc crash. This reverts commit r314151.
llvm-svn: 314199
|
| |
|
|
|
|
|
|
|
|
|
|
| |
llvm side.
Removing X86 broadcast(f/i)32x2 intrinsics from llvm.
Adding autoUpgrade support.
Moving matching tests from avx512dq-intrinsics.ll to avx512dq-intrinsics-upgrade.ll and from avx512dqvl-intrinsics.ll to avx512dqvl-intrinsics-upgrade.ll.
Differential Revision: https://reviews.llvm.org/D38220
llvm-svn: 314195
|
| |
|
|
|
|
| |
Thanks to Eli Friedman for the suggestion.
llvm-svn: 314182
|
| |
|
|
|
|
|
|
|
|
|
| |
spot as the original MBB
Discovered in avr-rust/rust#62
https://github.com/avr-rust/rust/issues/62
Patch by Gergo Erdi.
llvm-svn: 314180
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was an oversight in the original backend data layout.
The AVR architecture does not have the concept of unaligned loads - all
loads/stores from all addresses are aligned to one byte.
Discovered in avr-rust issue #64
https://github.com/avr-rust/rust/issues/64
Patch By Gergo Erdi.
llvm-svn: 314179
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
It leads to some improvements, but also a regression for the simple
case, so it's not clearly a good idea.
test/CodeGen/ARM/vcvt.ll now has test coverage to show the difference.
Ultimately, the right solution is probably to custom-lower fp-to-int
conversions, to something like ARMISD::VCVT_F32_S32 plus a bitcast.
It's hard to do the right thing when the implicit bitcast isn't visible
to DAG transforms.
llvm-svn: 314169
|
| |
|
|
|
|
|
|
| |
R12 is used for the SwiftError parameter. It is no longer a CSR as it
is used for transfer the SwiftError, and the caller must preserve it if
they need to.
llvm-svn: 314165
|
| |
|
|
|
|
|
|
|
|
|
|
| |
instead.
As far as I know SUBREG_TO_REG is stating that the upper bits are 0. But if we are just converting the GR32 with no checks, then we have no reason to say the upper bits are 0.
I don't really know how to test this today since I can't find anything that looks that closely at SUBREG_TO_REG. The test changes here seems to be some perturbance of register allocation.
Differential Revision: https://reviews.llvm.org/D38001
llvm-svn: 314152
|
| |
|
|
|
|
| |
the NOREX version of TEST.
llvm-svn: 314151
|
| |
|
|
|
|
| |
No functionality change intended.
llvm-svn: 314143
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
builtins.", rL314135.
Causing assertion failures on macos:
> Assertion failed: (Num < NumOperands && "Invalid child # of SDNode!"),
> function getOperand, file
> /Users/buildslave/jenkins/workspace/clang-stage1-cmake-RA-incremental/llvm/include/llvm/CodeGen/SelectionDAGNodes.h,
> line 835.
http://green.lab.llvm.org/green/job/clang-stage1-cmake-RA-incremental/42739/testReport/LLVM/CodeGen_NVPTX/surf_read_cuda_ll/
llvm-svn: 314142
|
| |
|
|
|
|
|
|
|
| |
x86-asm-syntax=intel (PR34617).
Fix for incorrect code generation when x86-asm-syntax=intel.
Differential Revision: https://reviews.llvm.org/D37945
llvm-svn: 314140
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
vectors
This teach simplifyDemandedBits to handle constant splat vector shifts.
This required changing some uses of getZExtValue to getLimitedValue since we can't rely on legalization using getShiftAmountTy for the shift amount.
I believe there may have been a bug in the ((X << C1) >>u ShAmt) handling where we didn't check if the inner shift was too large. I've fixed that here.
I had to add new patterns to ARM because the zext/sext the patterns were trying to look for got turned into an any_extend with this patch. Happy to split that out too, but not sure how to test without this change.
Differential Revision: https://reviews.llvm.org/D37665
llvm-svn: 314139
|
| |
|
|
|
|
|
|
|
|
|
| |
Add two callbacks to MachineEvaluator, so that specific implementations
can specify more details about register classes:
- composeWithSubRegIndex(RC,Idx), to provide the register class for a
register from RC used in conjunction with a subregister index Idx.
- getPhysRegBitWidth(Reg), to provide the size in bits of the given
physical register.
llvm-svn: 314136
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D38191
llvm-svn: 314135
|
| |
|
|
| |
llvm-svn: 314134
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
insert_subvector with zero after masked compares with fewer patterns with predicate
This replaces the large number of patterns that handle every possible case of zeroing after a masked compare with a few simpler patterns that use a predicate to check for a masked compare producer.
This is similar to what we do for detecting free GR32->GR64 zero extends and free xmm->ymm/zmm zero extends.
This shrinks the isel table from ~590k to ~531k. This is a roughly 10% reduction in size.
Differential Revision: https://reviews.llvm.org/D38217
llvm-svn: 314133
|
| |
|
|
|
|
|
| |
We use a differently ordered CSR set if the frame pointer is pushed. Add a
matching ..._SwiftError version.
llvm-svn: 314128
|
| |
|
|
|
|
| |
A ternary is clearer here. No functionality change.
llvm-svn: 314123
|
| |
|
|
|
|
| |
Noticed by inspection
llvm-svn: 314121
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
X86InterleavedAccess (VF8 stride 4):
This patch expands the support of lowerInterleavedStore to 8x8i stride 4.
LLVM creates suboptimal shuffle code-gen for AVX2.
In overall, this patch is a specific fix for the pattern (Strid=4 VF=8) and we plan to include more patterns in the future.
The patch goal is to optimize the following sequence:
At the end of the computation, we have xmm2, xmm0, xmm12 and xmm3 holding
each 8 chars:
c0, c1, , c7
m0, m1, , m7
y0, y1, , y7
k0, k1, ., k7
And these need to be transposed/interleaved and stored like so:
c0 m0 y0 k0 c1 m1 y1 k1 c2 m2 y2 k2 c3 m3 y3 k3 ....
Reviewers
DavidKreitzer
Farhana
zvi
igorb
guyblank
RKSimon
Ayal
Differential Revision: https://reviews.llvm.org/D36058
Change-Id: I3cc5c2ca5d6318901c192a4428493b99ef424c32
llvm-svn: 314109
|
| |
|
|
|
|
|
|
| |
As mentioned in https://reviews.llvm.org/D33718, this simply adds another
pattern to the compare elimination sequence and is committed without a
differential review.
llvm-svn: 314106
|
| |
|
|
| |
llvm-svn: 314105
|
| |
|
|
|
| |
Change-Id: I1ddc619169fae6a56308deef8dae5db3da702cf4
llvm-svn: 314103
|