| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At O3 we are more willing to increase size if we believe it will improve
performance. The current threshold for tail-duplication of 2 instructions is
conservative, and can be relaxed at O3.
Benchmark results:
llvm test-suite:
6% improvement in aha, due to duplication of loop latch
3% improvement in hexxagon
2% slowdown in lpbench. Seems related, but couldn't completely diagnose.
Internal google benchmark:
Produces 4% improvement on internal google protocol buffer serialization
benchmarks.
Differential-Revision: https://reviews.llvm.org/D32324
llvm-svn: 303084
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
ReadMem/WriteMem (PR32146)
Follow up to D33147
NVPTXTargetLowering::LowerCall was trusting the default argument values.
Fixes another 17 of the NVPTX '-verify-machineinstrs with EXPENSIVE_CHECKS' errors in PR32146.
Differential Revision: https://reviews.llvm.org/D33189
llvm-svn: 303082
|
| |
|
|
|
|
| |
The table should include only defined symbols.
llvm-svn: 303075
|
| |
|
|
|
|
| |
add/sub/mul
llvm-svn: 303074
|
| |
|
|
|
|
|
|
| |
This patch enables fusing dependent AESE/AESMC and AESD/AESIMC
instruction pairs on Cortex-A72, as recommended in the Software
Optimization Guide, section 4.10.
llvm-svn: 303073
|
| |
|
|
|
|
|
|
|
|
| |
See bug 32936: https://bugs.llvm.org//show_bug.cgi?id=32936
Reviewers: artem.tamazov, vpykhtin
Differential Revision: https://reviews.llvm.org/D33123
llvm-svn: 303070
|
| |
|
|
| |
llvm-svn: 303069
|
| |
|
|
| |
llvm-svn: 303059
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This instruction does not really exist
See Bug 33018: https://bugs.llvm.org//show_bug.cgi?id=33018
Reviewers: vpykhtin, artem.tamazov
Differential Revision: https://reviews.llvm.org/D33126
llvm-svn: 303055
|
| |
|
|
|
|
|
|
|
|
|
| |
Doing this means that if an LEApcrel is used in two places we will rematerialize
instead of generating two MOVs. This is particularly useful for printfs using
the same format string, where we want to generate an address into a register
that's going to get corrupted by the call.
Differential Revision: https://reviews.llvm.org/D32858
llvm-svn: 303054
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Doing this lets us hoist it out of loops, and I've also marked it as
rematerializable the same as the thumb1 and thumb2 counterparts.
It looks like it being marked as such was just a mistake, as the commit that
made that change only mentions LEApcrelJT and in thumb1 and thumb2 only the
LEApcrelJT instructions were marked as having side-effects, so it looks like
the intent was to only mark LEApcrelJT as having side-effects but LEApcrel was
accidentally marked as such also.
Differential Revision: https://reviews.llvm.org/D32857
llvm-svn: 303053
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
intrinsics to run also on -O0 option.
Currently, when masked load, store, gather or scatter intrinsics are used, we check in CodeGenPrepare pass if the subtarget support these intrinsics, if not we replace them with scalar code - this is a functional transformation not an optimization (not optional).
CodeGenPrepare pass does not run when the optimization level is set to CodeGenOpt::None (-O0).
Functional transformation should run with all optimization levels, so here I created a new pass which runs on all optimization levels and does no more than this transformation.
Differential Revision: https://reviews.llvm.org/D32487
llvm-svn: 303050
|
| |
|
|
|
|
|
|
| |
Reviewers: stoklund, grosbach, vpykhtin
Differential Revision: https://reviews.llvm.org/D32493
llvm-svn: 303044
|
| |
|
|
|
|
|
|
|
| |
We were previously silently emitting bogus data in release mode,
making it very hard to diagnose the error, or crashing with an
assert in debug mode. A proper diagnostic is now always emitted
when the value to be emitted is out of range.
llvm-svn: 303041
|
| |
|
|
| |
llvm-svn: 303036
|
| |
|
|
| |
llvm-svn: 303034
|
| |
|
|
|
|
|
| |
One didn't correctly fine the regex variable, the other still had a RUN
line for FNOBUILTIN-checks, which weren't copied to the file.
llvm-svn: 303025
|
| |
|
|
| |
llvm-svn: 303023
|
| |
|
|
| |
llvm-svn: 303022
|
| |
|
|
| |
llvm-svn: 303021
|
| |
|
|
|
|
|
|
| |
can see what compare instructions are being used in the lookup table code.
I noticed the 512-bit lzcnts don't use the X86 specific lookup table code and instead use the EXPAND case in LegalizeDAG. I was toying around with fixing this and noticed it would require compare instructions that generate i1 masks and then converting from mask to vector. Then I noticed that we don't test which compares are used with avx512vl and no avx512cd.
llvm-svn: 303020
|
| |
|
|
|
|
| |
Remove an unneeded prefix from the 32-bit command line. Make all the 64-bit triples match. Replace ALL with X64 and remove it from the 32-bit test.
llvm-svn: 303019
|
| |
|
|
|
|
| |
sequences
llvm-svn: 303017
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Running `llvm-readobj -coff-directives msvcrt.lib` resulted in this error:
Invalid data was encountered while parsing the file
This happened because some of the object files in the archive have empty
`.drectve` sections. These empty sections result in a `parse_failed` error being
returned from `COFFObjectFile::getSectionContents()`, which in turn caused
`llvm-readobj` to stop. With this change, `getSectionContents` now returns
success, and like before the resulting array is empty.
Patch by Dave Lee.
Differential Revision: https://reviews.llvm.org/D32652
llvm-svn: 303014
|
| |
|
|
|
|
|
|
| |
mask.
Tweak cost model to match what lowering actually does.
llvm-svn: 303013
|
| |
|
|
| |
llvm-svn: 303012
|
| |
|
|
| |
llvm-svn: 303010
|
| |
|
|
|
|
| |
constant splats for vXi64 shifts.
llvm-svn: 303009
|
| |
|
|
|
|
| |
Shows issue with 32-bits not being able to peek through subvectors to extract constant splats
llvm-svn: 303008
|
| |
|
|
|
|
|
|
| |
its commuted variants.
We already had (A & ~B) | (A ^ B), but we missed the cases where the not was part of the xor.
llvm-svn: 303004
|
| |
|
|
| |
llvm-svn: 303003
|
| |
|
|
| |
llvm-svn: 303000
|
| |
|
|
| |
llvm-svn: 302998
|
| |
|
|
|
|
| |
demandedelts in ComputeNumSignBits
llvm-svn: 302997
|
| |
|
|
|
|
| |
demandedelts support in ComputeNumSignBits
llvm-svn: 302994
|
| |
|
|
| |
llvm-svn: 302993
|
| |
|
|
| |
llvm-svn: 302992
|
| |
|
|
|
|
|
| |
Tests that use target intrinsics are inherently target specific. Mark
them as such.
llvm-svn: 302990
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Further perf tests on Jaguar indicate that:
vxorps %ymm0, %ymm0, %ymm0
vcmpps $15, %ymm0, %ymm0, %ymm0
is consistently faster (by about 9%) than:
vpcmpeqd %xmm0, %xmm0, %xmm0
vinsertf128 $1, %xmm0, %ymm0, %ymm0
Testing equivalent code on a SandyBridge (E5-2640) puts it slightly (~3%) faster as well.
Committed on behalf of @dtemirbulatov
Differential Revision: https://reviews.llvm.org/D32416
llvm-svn: 302989
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The Loop vectorizer pass introduced undef value while it is fixing output of LCSSA form.
Here it is:
before: %e.0.ph = phi i32 [ 0, %for.inc.2.i ]
after: %e.0.ph = phi i32 [ 0, %for.inc.2.i ], [ undef, %middle.block ]
and after this change we have:
%e.0.ph = phi i32 [ 0, %for.inc.2.i ]
%e.0.ph = phi i32 [ 0, %for.inc.2.i ], [ 0, %middle.block ]
Committed on behalf of @dtemirbulatov
Differential Revision: https://reviews.llvm.org/D33055
llvm-svn: 302988
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
something changed in the initial Worklist creation
Summary:
If the Worklist build causes an IR change this change flag currently factors into the flag for running another iteration of the iteration loop. But only changes during processing should trigger another loop.
This patch captures the worklist creation change flag into the outside the loop flag currently used for DbgDeclares and only sends that flag up to the caller. Rerunning the loop only depends on IC.run() now.
This uses the debug output of InstCombine to determine if one or two iterations run. I couldn't think of a better way to detect it since the second spurious iteration shoudn't make any visible changes. Just wasted computation.
I can do a pre-commit of the test case with the CHECK-NOT as a CHECK if this is an ok way to check this.
This is a subset of D31678 as I'm still not sure how to verify the analysis behavior for that.
Reviewers: davide, majnemer, spatel, chandlerc
Reviewed By: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D32453
llvm-svn: 302982
|
| |
|
|
|
|
|
| |
This allows us to mark this as `REQUIRES: x86`, since it uses x86
target specific intrinsics.
llvm-svn: 302980
|
| |
|
|
|
|
|
|
| |
Tests with target intrinsics are inherently target specific, so it
doesn't actually make sense to run them if we've excluded their
target.
llvm-svn: 302979
|
| |
|
|
| |
llvm-svn: 302977
|
| |
|
|
|
|
|
|
|
|
| |
I bet the change is correct but this test seems to expose some underlying
problem that manifest only on some buildbots, and I'm not able to reproduce
locally. Unfortunately I can't debug right now but I don't want to annoy
people with spurious failures, so I'll XFAIL until I can take a look (over
the weekend).
llvm-svn: 302976
|
| |
|
|
|
|
|
|
|
|
| |
Contributed by Dr. Gergő Érdi.
Fixes a bug.
Raised from (https://github.com/avr-rust/rust/issues/49).
llvm-svn: 302973
|
| |
|
|
|
|
|
|
|
| |
Update a few tests to use llvm.masked.load/store instead of arm neon
vector loads and stores, and move the tests that are actually specific
to those arm intrinsics to their own files. This lets us mark the
tests that use target specific intrinsics as requiring those targets.
llvm-svn: 302972
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Implemented frequency based cost/saving analysis
and related options.
The pass is now in a state ready to be turne on
in the pipeline (in follow up).
Differential Revision: http://reviews.llvm.org/D32783
llvm-svn: 302967
|
| |
|
|
|
|
|
|
|
|
| |
to SVML routines
Patch by Chris Chrulski
Differential Revision: https://reviews.llvm.org/D31789
llvm-svn: 302957
|
| |
|
|
|
|
|
|
|
|
| |
generated from -ffast-math
Patch by Chris Chrulski
Differential Revision: https://reviews.llvm.org/D31788
llvm-svn: 302956
|