| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When PR16508 was solved (in rL185363) a regression test was
added as test/CodeGen/PowerPC/2013-07-01-PHIElimBug.ll.
I discovered that the test case no longer reproduced the
scenario from PR16508. This problem could have been amended
by adding an extra RUN line with "-O1" (or possibly "-O0"),
but instead I added a mir-reproducer
test/CodeGen/PowerPC/2013-07-01-PHIElimBug.mir
to get a reproducer that is less sensitive to changes in
earlier passes (including O-level).
While being at it I also corrected a code comment in
PHIElimination::EliminatePHINodes that has been incorrect
since the related bugfix from rL185363.
Reviewers: MatzeB, hfinkel
Reviewed By: MatzeB
Subscribers: nemanjai, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D52553
llvm-svn: 343416
|
| |
|
|
| |
llvm-svn: 343404
|
| |
|
|
|
|
| |
Exposes another extractelement(bitcast(scalartovector())) pattern
llvm-svn: 343403
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
on compiling for a CPU with single uop BEXTR
Summary:
This function turns (X >> C1) & C2 into a BMI BEXTR or TBM BEXTRI instruction. For BMI BEXTR we have to materialize an immediate into a register to feed to the BEXTR instruction.
The BMI BEXTR instruction is 2 uops on Intel CPUs. It looks like on SKL its one port 0/6 uop and one port 1/5 uop. Despite what Agner's tables say. I know one of the uops is a regular shift uop so it would have to go through the port 0/6 shifter unit. So that's the same or worse execution wise than the shift+and which is one 0/6 uop and one 0/1/5/6 uop. The move immediate into register is an additional 0/1/5/6 uop.
For now I've limited this transform to AMD CPUs which have a single uop BEXTR. If may also might make sense if we can fold a load or if the and immediate is larger than 32-bits and can't be encoded as a sign extended 32-bit value or if LICM or CSE can hoist the move immediate and share it. But we'd need to look more carefully at that. In the regression I looked at it doesn't look load folding or large immediates were occurring so the regression isn't caused by the loss of those. So we could try to be smarter here if we find a compelling case.
Reviewers: RKSimon, spatel, lebedev.ri, andreadb
Reviewed By: RKSimon
Subscribers: llvm-commits, andreadb, RKSimon
Differential Revision: https://reviews.llvm.org/D52570
llvm-svn: 343399
|
| |
|
|
| |
llvm-svn: 343392
|
| |
|
|
| |
llvm-svn: 343391
|
| |
|
|
|
|
|
|
| |
shuffles before simplifying inputs
By removing demanded target shuffles that simplify to zero/undef/identity before simplifying its inputs we improve chances of further simplification, as only the immediate parent user of the combined is added back to the work list - this still doesn't help us if its passed through other ops though (bitcasts....).
llvm-svn: 343390
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
added to clang
This adds tests for:
_mm_loadu_si16
_mm_loadu_si32
_mm_loadu_si16
_mm_storeu_si64
_mm_storeu_si32
_mm_storeu_si16
llvm-svn: 343389
|
| |
|
|
|
|
|
|
| |
bits via shuffles
Exposed an issue that recursive calls to getTargetConstantBitsFromNode don't handle changes to EltSizeInBits yet.
llvm-svn: 343384
|
| |
|
|
| |
llvm-svn: 343376
|
| |
|
|
|
|
| |
ISD::EXTRACT_SUBVECTOR
llvm-svn: 343375
|
| |
|
|
|
|
| |
The shift amount might have peeked through a extract_subvector, altering the number of vector elements in the 'Amt' variable - so we were incorrectly calculating the ratio when peeking through bitcasts, resulting in incorrectly detecting splats.
llvm-svn: 343373
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Correctly check for relocations in the constant to promote. And don't
allow promoting a constant multiple times.
This partially fixes https://bugs.llvm.org//show_bug.cgi?id=32780 ;
it's not a complete fix because we also need to prevent
ARMConstantIslands from cloning the constant.
(-arm-promote-constant is currently off by default, and it stays off
with this patch. I'll look into turning it on again when all the known
issues are fixed.)
Differential Revision: https://reviews.llvm.org/D51472
llvm-svn: 343361
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This mostly affects IR generated by non-clang frontends because clang
generally sets the alignment of globals explicitly.
Fixes https://bugs.llvm.org//show_bug.cgi?id=32394 .
(-arm-promote-constant is currently off by default, and it stays off
with this patch. I'll look into turning it on again when all the known
issues are fixed.)
Differential Revision: https://reviews.llvm.org/D51469
llvm-svn: 343359
|
| |
|
|
|
|
|
|
| |
instructions when a truncate is beteen the CMP and the AND and the sign flag is used.
The code in X86ISelDAGToDAG only looks through truncates if the sign flag isn't used, but that is overly restrictive. A future patch will improve this.
llvm-svn: 343355
|
| |
|
|
|
|
|
|
|
|
| |
Split the `zcz` feature into specific ones got GP and FP registers, `zcz-gp`
and `zcz-fp`, respectively, while retaining the original feature option to
mean both.
Differential revision: https://reviews.llvm.org/D52621
llvm-svn: 343354
|
| |
|
|
|
|
| |
- asan buildbots are breaking and I need to investigate the issue
llvm-svn: 343341
|
| |
|
|
|
|
|
|
|
|
|
| |
https://reviews.llvm.org/D51147
Asserting if any extend of vectors should be up to the target's
legalizer/target specific code not in CallLowering.
reviewed by : dsanders.
llvm-svn: 343325
|
| |
|
|
|
|
|
|
| |
- Add fix so that all code paths that create DWARFContext
with an ObjectFile initialise the target architecture in the context
- Add an assert that the Arch is known in the Dwarf CallFrameString method
llvm-svn: 343317
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Lower integer arguments larger then 32 bits for MIPS32.
setMostSignificantFirst is used in order for G_UNMERGE_VALUES and
G_MERGE_VALUES to always hold registers in same order, regardless of
endianness.
Patch by Petar Avramovic.
Differential Revision: https://reviews.llvm.org/D52409
llvm-svn: 343315
|
| |
|
|
|
|
|
|
|
|
|
|
| |
On GreenDragon `CodeGen/X86/cpus.ll` is timing out on the bot with Asan
and UBSan enabled. With the same configuration on my machine, the test
passes but takes more than 3 minutes to do so. I could increase the
timeout, but I believe it makes more sense to split up the test because
it allows for more parallelism.
Differential revision: https://reviews.llvm.org/D52603
llvm-svn: 343313
|
| |
|
|
|
|
|
|
| |
Double throughput to account for 2 pipes + fix BSF's latency/uop counts
Match AMD Fam16h SOG + llvm-exegesis tests
llvm-svn: 343311
|
| |
|
|
|
|
|
|
|
|
|
| |
The NoMovt feature prevents the use of MOVW/MOVT
instructions on Cortex-M23 for performance reasons.
These instructions are required for execute only code
so NoMovt should be disabled when that option is enabled.
Differential Revision: https://reviews.llvm.org/D52551
llvm-svn: 343302
|
| |
|
|
|
|
| |
PHMINPOS can run on either JFPU pipe
llvm-svn: 343299
|
| |
|
|
|
|
| |
The assembly for this test should be optimal now after changes to the ScalarizeMaskedMemIntrin patch.
llvm-svn: 343281
|
| |
|
|
|
|
|
|
| |
passthru vector and insert the new load results into it.
Previously we started with undef and did a final merge with the passthru at the end.
llvm-svn: 343273
|
| |
|
|
|
|
|
|
| |
passthru value and insert each conditional load result over their element.
Previously we started with undef and did one final merge at the end with a select.
llvm-svn: 343271
|
| |
|
|
|
|
|
|
| |
This allows to reduce a number of used VGPRs in some cases.
Differential Revision: https://reviews.llvm.org/D52577
llvm-svn: 343249
|
| |
|
|
|
|
|
|
| |
values. That's just %x so use that directly.
Had we emitted this IR earlier, InstCombine would have removed icmp so I'm going to assume using the i1 directly would be considered canonical.
llvm-svn: 343244
|
| |
|
|
| |
llvm-svn: 343235
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: It is currently broken and for Sparc there is not much benefit
in using a builtin version compared to a library version. Both versions
needs to store the same four values in setjmp and flush the register
windows in longjmp. If the need for a builtin setjmp/longjmp arises there
is an improved implementation available at https://reviews.llvm.org/D50969.
Reviewers: jyknight, joerg, venkatra
Subscribers: fedor.sergeev, jrtc27, llvm-commits
Differential Revision: https://reviews.llvm.org/D51487
llvm-svn: 343210
|
| |
|
|
| |
llvm-svn: 343192
|
| |
|
|
|
|
| |
We now generate cttz with the zero_undef flag set to false. This allows -O0 to avoid the zero check.
llvm-svn: 343127
|
| |
|
|
| |
llvm-svn: 343114
|
| |
|
|
|
|
| |
- Reapply r343089 with a fix for DebugInfo/Sparc/gnu-window-save.ll
llvm-svn: 343112
|
| |
|
|
|
|
|
|
|
| |
It was the case when calling MO::dump(), but MI::dump() was still
depending on hasComplexRegisterTies().
The MIR output is not affected.
llvm-svn: 343107
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This caused the DebugInfo/Sparc/gnu-window-save.ll test to fail.
> Functions that have signed return addresses need additional dwarf support:
> - After signing the LR, and before authenticating it, the LR register is in a
> state the is unusable by a debugger or unwinder
> - To account for this a new directive, .cfi_negate_ra_state, is added
> - This directive says the signed state of the LR register has now changed,
> i.e. unsigned -> signed or signed -> unsigned
> - This directive has the same CFA code as the SPARC directive GNU_window_save
> (0x2d), adding a macro to account for multiply defined codes
> - This patch matches the gcc implementation of this support:
> https://patchwork.ozlabs.org/patch/800271/
>
> Differential Revision: https://reviews.llvm.org/D50136
llvm-svn: 343103
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds a check to optimize conditional branch (BC and BCn) based on a constant set by CRSET or CRUNSET.
Other optimizers, such as block placement, may generate such code and hence
I do this at the very end of the optimization in pre-emit peephole pass.
A conditional branch based on a constant is eliminated or converted into unconditional branch.
Also CRSET/CRUNSET is eliminated if the condition code register is not used
by instruction other than the branch to be optimized.
Differential Revision: https://reviews.llvm.org/D52345
llvm-svn: 343100
|
| |
|
|
|
|
| |
The previously reduced version used urem <9 x i32> zeroinitializer, %tmp which D52504 will simplify.
llvm-svn: 343097
|
| |
|
|
|
|
|
|
|
|
| |
Similar to the existing ISD::SRL constant vector shifts from D49562, this patch adds ISD::SRA support with ISD::MULHS.
As we're dealing with signed values, we have to handle shift by zero and shift by one special cases, so XOP+AVX2/AVX512 splitting/extension is still a better solution - really we should still use ISD::MULHS if one of the special cases are used but for now I've just left a TODO and filtered by isKnownNeverZero.
Differential Revision: https://reviews.llvm.org/D52171
llvm-svn: 343093
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
When calculating whether a value can safely overflow for use by an
icmp, we weren't checking that the value couldn't wrap around. To do
this we need the icmp to be using a constant, as well as the incoming
add or sub.
bugzilla report: https://bugs.llvm.org/show_bug.cgi?id=39060
Differential Revision: https://reviews.llvm.org/D52463
llvm-svn: 343092
|
| |
|
|
|
|
|
|
|
|
|
| |
Adding NonNull as attributes to returned pointers has the unfortunate side
effect of disabling tail calls. This patch ignores the NonNull attribute when
we decide whether to tail merge, in the same way that we ignore the NoAlias
attribute, as it has no affect on the call sequence.
Differential Revision: https://reviews.llvm.org/D52238
llvm-svn: 343091
|
| |
|
|
|
|
|
|
| |
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D51495
llvm-svn: 343090
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Functions that have signed return addresses need additional dwarf support:
- After signing the LR, and before authenticating it, the LR register is in a
state the is unusable by a debugger or unwinder
- To account for this a new directive, .cfi_negate_ra_state, is added
- This directive says the signed state of the LR register has now changed,
i.e. unsigned -> signed or signed -> unsigned
- This directive has the same CFA code as the SPARC directive GNU_window_save
(0x2d), adding a macro to account for multiply defined codes
- This patch matches the gcc implementation of this support:
https://patchwork.ozlabs.org/patch/800271/
Differential Revision: https://reviews.llvm.org/D50136
llvm-svn: 343089
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This broke Chromium's Android build (https://crbug.com/889390) and the
polly-aosp buildbot
(http://lab.llvm.org:8011/builders/aosp-O3-polly-before-vectorizer-unprofitable).
> Originally committed in rL342210 but was reverted in rL342260 because
> it was causing issues in vectorized code, because I had forgotten to
> ensure that we're operating on scalar values.
>
> Original commit message:
>
> On failing to find sequences that can be converted into dual macs,
> try to find sequential 16-bit loads that are used by muls which we
> can then use smultb, smulbt, smultt with a wide load.
>
> Differential Revision: https://reviews.llvm.org/D51983
llvm-svn: 343082
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit bd7b44f35ee9fbe365eb25ce55437ea793b39346.
Reland r342994: disabled the optimization and explicitly enable it in test.
-mllvm -consthoist-min-num-to-rebase<unsigned>=0
[ConstHoist] Do not rebase single (or few) dependent constant
If an instance (InsertionPoint or IP) of Base constant A has only one or few
rebased constants depending on it, do NOT rebase. One extra ADD instruction is
required to materialize each rebased constant, assuming A and the rebased have
the same materialization cost.
Differential Revision: https://reviews.llvm.org/D52243
llvm-svn: 343053
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Lowers (s|u)itofp and fpto(s|u)i instructions for vectors. The fp to
int conversions produce poison values if their arguments are out of
the convertible range, so a future CL will have to add an LLVM
intrinsic to make the saturating behavior of this conversion usable.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D52372
llvm-svn: 343052
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D52522
llvm-svn: 343047
|
| |
|
|
|
|
|
|
|
|
| |
input types.
This removes an int->fp bitcast between the surrounding code and the movmsk. I had already added a hack to combineMOVMSK to try to look through this bitcast to improve the SimplifyDemandedBits there.
But I found an additional issue where the bitcast was preventing combineMOVMSK from being called again after earlier nodes in the DAG are optimized. The bitcast gets revisted, but not the user of the bitcast. By using integer types throughout, the bitcast doesn't get in the way.
llvm-svn: 343046
|
| |
|
|
|
|
|
|
| |
These IR patterns represent the exact behavior of a movmsk instruction using (zext (bitcast (icmp slt X, 0))).
For the v4i32/v8i32/v2i64/v4i64 we currently emit a PCMPGT for the icmp slt which is unnecessary since we only care about the sign bit of the result. This is because of the int->fp bitcast we put on the input to the movmsk nodes for these cases. I'll be fixing this in a future patch.
llvm-svn: 343045
|