| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
| |
This broke the lit tests on a bunch of buildbots, e.g.
http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-ubuntu-fast/builds/36679
> Reviewed By: MatzeB
>
> Differential Revision: https://reviews.llvm.org/D51495
llvm-svn: 342482
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This is a follow-up to the previous patch that eliminated some of the rotates.
With this addition, we will also emit the record-form andis.
This patch increases the number of record-form rotates we eliminate by
more than 70%.
Differential revision: https://reviews.llvm.org/D44897
llvm-svn: 342478
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Currently only the first function in the module is checked to
see if it has remarks enabled. If that first function is a declaration,
remarks will be incorrectly skipped. Change to look for the first
non-empty function.
Reviewers: pcc
Subscribers: mehdi_amini, inglorion, eraman, steven_wu, dexonsmith, llvm-commits
Differential Revision: https://reviews.llvm.org/D51556
llvm-svn: 342477
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adds LLVMAddUnifyFunctionExitNodesPass to expose
createUnifyFunctionExitNodesPass to the C and OCaml APIs.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52212
llvm-svn: 342476
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adds LLVMAddLowerAtomicPass to expose createLowerAtomicPass in the C
and OCaml APIs.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: jfb, llvm-commits
Differential Revision: https://reviews.llvm.org/D52211
llvm-svn: 342475
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Both ANDIo and ANDISo (and the 64-bit versions) are record-form instructions.
When optimizing compares, we handle the former in order to eliminate the compare
instruction but not the latter. This patch just adds the latter to the set of
instructions we optimize.
The reason these instructions need to be handled separately is that they are not
part of the RecFormRel map (since they don't have a non-record-form). The
missing "and-immediate-shifted" is just an oversight in the initial
implementation.
Differential revision: https://reviews.llvm.org/D51353
llvm-svn: 342472
|
| |
|
|
|
|
|
|
|
| |
Since Android API version 9 the Android libm has had the sincos functions, so
they should be recognised as libcalls and sincos optimisation should be applied.
Differential Revision: https://reviews.llvm.org/D52025
llvm-svn: 342471
|
| |
|
|
|
|
| |
calls. NFCI.
llvm-svn: 342462
|
| |
|
|
|
|
|
|
| |
Reviewed By: MatzeB
Differential Revision: https://reviews.llvm.org/D51495
llvm-svn: 342457
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This tries to make use of evaluateAsRelocatable in AArch64AsmParser::classifySymbolRef
to parse more complex expressions as relocatable operands. It is hopefully better than
the existing code which only handles Symbol +- Constant.
This allows us to parse more complex adr/adrp, mov, ldr/str and add operands. It also
loosens the requirements on parsing addends in ld/st and mov's and adds a number of
tests.
Differential Revision: https://reviews.llvm.org/D51792
llvm-svn: 342455
|
| |
|
|
|
|
|
|
|
|
|
| |
A piece of logic in rewriteLoopExitValues has a weird check on number of
users which allowed an unprofitable transform in case if an instruction has
more than 6 users.
Differential Revision: https://reviews.llvm.org/D51404
Reviewed By: etherzhhb
llvm-svn: 342444
|
| |
|
|
|
|
|
| |
If there is a single use constant, it can be folded into the
min/max, but not into med3.
llvm-svn: 342443
|
| |
|
|
|
|
|
|
|
|
| |
This was checking the hardcoded address space 0 for the stack.
Additionally, this should be checking for legality with
the adjusted alignment, so defer the alignment check.
Also try to split if the unaligned access isn't allowed.
llvm-svn: 342442
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When doing some instruction scheduling work, we noticed some missing itineraries.
Before we switch to machine scheduler, those missing itineraries might not have impact to actually scheduling,
because we can still get same latency due to default values.
With machine scheduler, however, itineraries will have impact to scheduling.
eg: NumMicroOps will default to be 0 if there is NO itineraries for specific instruction class.
And most of the instruction class with itineraries will have NumMicroOps default to 1.
This will has impact on the count of RetiredMOps, affects the Pending/Available Queue,
then causing different scheduling or suboptimal scheduling further.
Patch By: jsji (Jinsong Ji)
Differential Revision: https://reviews.llvm.org/D52040
llvm-svn: 342441
|
| |
|
|
| |
llvm-svn: 342439
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch adds LLVMIsLiteralStruct to the C API to expose
StructType::isLiteral. This is then used to implement the analogous
addition to the OCaml API.
Reviewers: whitequark, deadalnix
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52209
llvm-svn: 342435
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
ConstantExpr supports getIndices, but prior to this patch
LLVMGetNumIndices and LLVMGetIndices would error on them.
Reviewers: whitequark
Reviewed By: whitequark
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52206
llvm-svn: 342434
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts r342395 as it caused error
> Argument value type does not match pointer operand type!
> %0 = atomicrmw volatile xchg i8* %_Value1, i32 1 monotonic, !dbg !25
> i8in function atomic_flag_test_and_set
> fatal error: error in backend: Broken function found, compilation aborted!
on bot http://green.lab.llvm.org/green/job/clang-stage1-configure-RA/
More details are available at https://reviews.llvm.org/D52080
llvm-svn: 342431
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
failures easier to track.
Summary:
EarlyCSE can make IR changes that will leave MemorySSA with accesses claiming to be optimized, but for which a subsequent MemorySSA run will yield a different optimized result.
Due to relying on AA queries, we can't fix this in general, unless we recompute MemorySSA.
Adding some tests to track this and a basic verify for future potential failures.
Reviewers: george.burgess.iv, gberry
Subscribers: sanjoy, jlebar, Prazek, llvm-commits
Differential Revision: https://reviews.llvm.org/D51960
llvm-svn: 342422
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Add support mips64(el)-linux-gnuabin32 triples, and set them to N32.
Debian architecture name mipsn32/mipsn32el are also added. Set
UseIntegratedAssembler for N32 if we can detect it.
Patch by YunQiang Su.
Differential revision: https://reviews.llvm.org/D51408
llvm-svn: 342416
|
| |
|
|
|
|
|
|
|
|
|
| |
Previously we would dump the names of enum types, but not their
enumerator values. This adds support for enumerator values. In
doing so, we have to introduce a general purpose mechanism for
caching symbol indices of field list members. Unlike global
types, FieldList members do not have a TypeIndex. So instead,
we identify them by the pair {TypeIndexOfFieldList, IndexInFieldList}.
llvm-svn: 342415
|
| |
|
|
|
|
|
|
| |
Previously for cv-qualified types, we would just ignore them
and they would never get printed. Now we can enumerate them
and cache them like any other symbol type.
llvm-svn: 342414
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Defer unnecessary early inlining of constants to symbol
variants. Fixes PR38945.
Reviewers: nickdesaulniers, rnk
Subscribers: nemanjai, hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D52188
llvm-svn: 342412
|
| |
|
|
|
|
|
|
|
|
|
|
| |
getLoopID has different control flow for two cases: If there is a
single loop latch and for any other number of loop latches (0 and more
than one). The latter case should return the same result if there is
only a single latch. We can save the preceding redundant search for a
latch by handling both cases with the same code.
Differential Revision: https://reviews.llvm.org/D52118
llvm-svn: 342406
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were mapping an instruction every time we saw something we couldn't map
before this. Since each illegal mapping is unique, we only have to do this once.
This makes it so that we don't map illegal instructions when the previous
mapped instruction was illegal.
In CTMark (AArch64), this results in 240 fewer instruction mappings on
average over 619 files in total. The largest improvement is 12576 fewer
mappings in one file, and the smallest is 0. The median improvement is 101
fewer mappings.
llvm-svn: 342405
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The IR reference for the `byval` attribute states:
```
This indicates that the pointer parameter should really be passed by value
to the function. The attribute implies that a hidden copy of the pointee is
made between the caller and the callee, so the callee is unable to modify
the value in the caller. This attribute is only valid on LLVM pointer arguments.
```
However, on Win64, this attribute is unimplemented and the raw pointer is
passed to the callee instead. This is problematic, because frontend authors
relying on the implicit hidden copy (as happens for every other calling
convention) will see the passed value silently (if mutable memory) or
loudly (by means of a crash) modified because the callee treats the
location as scratch memory space it is allowed to mutate.
At this point, it's worth taking a step back to understand the context.
In most calling conventions, aggregates that are too large to be passed
in registers, instead get *copied* to the stack at a fixed (computable
from the signature) offset of the stack pointer. At the LLVM, we hide
this hidden copy behind the byval attribute. The caller passes a pointer
to the desired data and the callee receives a pointer, but these pointers
are not the same. In particular, the pointer that the callee receives
points to temporary stack memory allocated as part of the call lowering.
In most calling conventions, this pointer is never realized in registers
or memory. The temporary memory is simply defined by an implicit
offset from the stack pointer at function entry.
Win64, uniquely, works differently. The structure is still passed in
memory, but instead of being stored at an implicit memory offset, the
caller computes a pointer to the temporary memory and passes it to
the callee as a regular pointer (taking up a register, or if all
registers are taken up, an additional stack slot). Presumably, this
was done to allow eliding the copy when passing aggregates through
several functions on the stack.
This explains why ignoring the `byval` attribute mostly works on Win64.
The argument simply gets passed as a pointer and as long as we're ok
with the callee trampling all over that memory, there are no ill effects.
However, it does contradict the documentation of the `byval` attribute
which specifies that there is to be an implicit copy.
Frontends can of course work around this by never emitting the `byval`
attribute for Win64 and creating `alloca`s for the requisite temporary
stack slots (and that does appear to be what frontends are doing).
However, the presence of the `byval` attribute is not a trap for
frontend authors, since it seems to work, but silently modifies the
passed memory contrary to documentation.
I see two solutions:
- Disallow the `byval` attribute in the verifier if using the Win64
calling convention.
- Make it work by simply emitting a temporary stack copy as we would
with any other calling convention (frontends can of course always
not use the attribute if they want to elide the copy).
This patch implements the second option (make it work), though I would
be fine with the first also.
Ref: https://github.com/JuliaLang/julia/issues/28338
Reviewers: rnk
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51842
llvm-svn: 342402
|
| |
|
|
|
|
|
|
| |
I need to use it in the GCN codegen.
Differential Revision: https://reviews.llvm.org/D52123
llvm-svn: 342400
|
| |
|
|
|
|
|
|
|
| |
buildbot errors. Adjusted 2 test cases for ARM and darwin and fixed a bug with the original change in dsymutil."
This reverts commit r342218. Due to a number of failures under TSAN. An isolated
test case is being worked on.
llvm-svn: 342399
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This handles terminator instructions and more.
Summary:
I tested this patch by compiling sqlite3.ll (clang -O3 -mllvm -disable-llvm-optzns sqlite3.c.)
opt -called-value-propagation sqlite3.ll -time-passes -f -o out.ll
I get 10+% speedup for the pass. I expect some of the gain come from skipping terminator instructions.
=== BEFORE THE PATCH ===
===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 0.5562 seconds (0.5582 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.2485 ( 46.4%) 0.0120 ( 57.7%) 0.2605 ( 46.8%) 0.2615 ( 46.8%) Bitcode Writer
0.1607 ( 30.0%) 0.0079 ( 37.7%) 0.1685 ( 30.3%) 0.1693 ( 30.3%) Called Value Propagation
0.1262 ( 23.6%) 0.0009 ( 4.5%) 0.1271 ( 22.9%) 0.1275 ( 22.8%) Module Verifier
0.5353 (100.0%) 0.0209 (100.0%) 0.5562 (100.0%) 0.5582 (100.0%) Total
=== AFTER THE PATCH ===
===-------------------------------------------------------------------------===
... Pass execution timing report ...
===-------------------------------------------------------------------------===
Total Execution Time: 0.5338 seconds (0.5355 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name ---
0.2498 ( 48.6%) 0.0118 ( 59.3%) 0.2615 ( 49.0%) 0.2629 ( 49.1%) Bitcode Writer
0.1377 ( 26.8%) 0.0075 ( 37.8%) 0.1452 ( 27.2%) 0.1455 ( 27.2%) Called Value Propagation
0.1264 ( 24.6%) 0.0006 ( 3.0%) 0.1270 ( 23.8%) 0.1271 ( 23.7%) Module Verifier
0.5139 (100.0%) 0.0199 (100.0%) 0.5338 (100.0%) 0.5355 (100.0%) Total
Reviewers: davide, mssimpso
Reviewed By: davide
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D49108
llvm-svn: 342398
|
| |
|
|
|
|
|
|
| |
extract_subvector with invalid index.""
Fixed the assertion failure.
llvm-svn: 342397
|
| |
|
|
|
|
|
| |
Removes the redundant UnitType parameter from verifyUnitContents. I also
fixed some formatting issues as I was touching the file.
llvm-svn: 342396
|
| |
|
|
|
|
|
|
|
|
|
|
| |
isSupportedValue explicitly checked and accepted many types of value,
primarily for debugging reasons. Remove most of these checks and do a
bit of refactoring now that the pass is more stable. This also enables
ZExts to be sources, but this has very little practical benefit at the
moment extend instructions will still be introduced.
Differential Revision: https://reviews.llvm.org/D52080
llvm-svn: 342395
|
| |
|
|
|
|
|
|
|
|
| |
We allow overflowing instructions if they're decreasing and only used
by an unsigned compare. Add the extra condition that the icmp cannot
be using a negative immediate.
Differential Revision: https://reviews.llvm.org/D52102
llvm-svn: 342392
|
| |
|
|
| |
llvm-svn: 342390
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Rebase rL341954 since https://bugs.llvm.org/show_bug.cgi?id=38912
has been fixed by rL342055.
Precommit testing performed:
* Overnight runs of csmith comparing the output between programs
compiled with gvn-hoist enabled/disabled.
* Bootstrap builds of clang with UbSan/ASan configurations.
llvm-svn: 342387
|
| |
|
|
|
|
|
|
| |
This patch fixes calculating address of label for non-pic ppc64.
Differential Revision: https://reviews.llvm.org/D50965
llvm-svn: 342368
|
| |
|
|
| |
llvm-svn: 342360
|
| |
|
|
|
|
|
|
|
|
|
|
| |
std::vector::iterator type may be a pointer, then
iterator::value_type fails to compile since iterator is not a class,
namespace, or enumeration.
Patch by orivej (Orivej Desh)
Differential Revision: https://reviews.llvm.org/D52142
llvm-svn: 342354
|
| |
|
|
|
|
| |
For constant non-uniform cases we'll never introduce more and/andn/or selects than already occur in generic pre-SSE41 ISD::SRL lowering.
llvm-svn: 342352
|
| |
|
|
|
|
| |
Now that rL340913 has landed with improved v16i16 selects as shuffles.
llvm-svn: 342349
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a follow-up suggested in D51630 and originally proposed as an IR transform in D49040.
Copying the motivational statement by @evandro from that patch:
"This transformation helps some benchmarks in SPEC CPU2000 and CPU2006, such as 188.ammp,
447.dealII, 453.povray, and especially 300.twolf, as well as some proprietary benchmarks.
Otherwise, no regressions on x86-64 or A64."
I'm proposing to add only the minimum support for a DAG node here. Since we don't have an
LLVM IR intrinsic for cbrt, and there are no other DAG ways to create a FCBRT node yet, I
don't think we need to worry about DAG builder, legalization, a strict variant, etc. We
should be able to expand as needed when adding more functionality/transforms. For reference,
these are transform suggestions currently listed in SimplifyLibCalls.cpp:
// * cbrt(expN(X)) -> expN(x/3)
// * cbrt(sqrt(x)) -> pow(x,1/6)
// * cbrt(cbrt(x)) -> pow(x,1/9)
Also, given that we bail out on long double for now, there should not be any logical
differences between platforms (unless there's some platform out there that has pow()
but not cbrt()).
Differential Revision: https://reviews.llvm.org/D51753
llvm-svn: 342348
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://bugs.llvm.org/show_bug.cgi?id=38949
It's not clear to me that we even need a one-use check in this fold.
Ie, 2 independent loads might be better than a load+dependent shuffle.
Note that the existing re-use tests are not affected. We actually do form a
broadcast node in those tests now because there's no extra use of the
insert_subvector node in those cases. But something later in isel pattern
matching decides that it is not worth using a broadcast for the full load in
those tests:
Legalized selection DAG: %bb.0 'test_broadcast_2f64_4f64_reuse:'
t7: v2f64,ch = load<(load 16 from %ir.p0)> t0, t2, undef:i64
t4: i64,ch = CopyFromReg t0, Register:i64 %1
t10: ch = store<(store 16 into %ir.p1)> t7:1, t7, t4, undef:i64
t18: v4f64 = insert_subvector undef:v4f64, t7, Constant:i64<0>
t20: v4f64 = insert_subvector t18, t7, Constant:i64<2>
Becomes:
t7: v2f64,ch = load<(load 16 from %ir.p0)> t0, t2, undef:i64
t4: i64,ch = CopyFromReg t0, Register:i64 %1
t10: ch = store<(store 16 into %ir.p1)> t7:1, t7, t4, undef:i64
t21: v4f64 = X86ISD::SUBV_BROADCAST t7
ISEL: Starting selection on root node: t21: v4f64 = X86ISD::SUBV_BROADCAST t7
...
Created node: t27: v4f64 = INSERT_SUBREG IMPLICIT_DEF:v4f64, t7, TargetConstant:i32<7>
Morphed node: t21: v4f64 = VINSERTF128rr t27, t7, TargetConstant:i8<1>
llvm-svn: 342347
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(sub (zext x), (zext y)) --> (zext (sub x, y))
Summary:
If the sub doesn't overflow in the original type we can move it above the sext/zext.
This is similar to what we do for add. The overflow checking for sub is currently weaker than add, so the test cases are constructed for what is supported.
Reviewers: spatel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52075
llvm-svn: 342335
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Naively computing the hash after the PDB data has been generated is in practice
as fast as other approaches I tried. I also tried online-computing the hash as
parts of the PDB were written out (https://reviews.llvm.org/D51887; that's also
where all the measuring data is) and computing the hash in parallel
(https://reviews.llvm.org/D51957). This approach here is simplest, without
being slower.
Differential Revision: https://reviews.llvm.org/D51956
llvm-svn: 342333
|
| |
|
|
|
|
|
|
|
|
|
| |
* Use same method of initializing the output stream and its buffer
* Allow a nullptr Status pointer
* Don't print the mangled name on demangling error
* Write to N (if it is non-nullptr)
Differential Revision: https://reviews.llvm.org/D52104
llvm-svn: 342330
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This unfortunately adds a move, but isn't that better than going to the int domain and back?
Reviewers: RKSimon
Reviewed By: RKSimon
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52134
llvm-svn: 342327
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
MOVMSK only care about the sign bit so we don't need the setcc to fill the whole element with 0s/1s. We can just shift the bit we're looking for into the sign bit. This saves a constant pool load.
Inspired by PR38840.
Reviewers: RKSimon, spatel
Reviewed By: RKSimon
Subscribers: lebedev.ri, llvm-commits
Differential Revision: https://reviews.llvm.org/D52121
llvm-svn: 342326
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(PR38814)
Missing optimizations with blendv are shown in:
https://bugs.llvm.org/show_bug.cgi?id=38814
If this works, it's an easier and more powerful solution than adding pattern matching
for a few special cases in the backend. The potential danger with this transform in IR
is that the condition value can get separated from the select, and the backend might
not be able to make a blendv out of it again. I don't think that's too likely, but
I've kept this patch minimal with a 'TODO', so we can test that theory in the wild
before expanding the transform.
Differential Revision: https://reviews.llvm.org/D52059
llvm-svn: 342324
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
It is sometimes important to check that some newly-computed value
is non-negative and only n bits wide (where n is a variable.)
There are many ways to check that:
https://godbolt.org/z/o4RB8D
The last variant seems best?
(I'm sure there are some other variations i haven't thought of..)
The last (as far i know?) pattern, non-canonical due to the extra use.
https://godbolt.org/z/aCMsPk
https://rise4fun.com/Alive/I6f
https://bugs.llvm.org/show_bug.cgi?id=38708
Reviewers: spatel, craig.topper, RKSimon
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D52062
llvm-svn: 342321
|
| |
|
|
|
|
|
|
|
| |
CodeGenPrepare has a transform that sinks {lshr, trunc} pairs to make it
easier for the backend to emit fancy extract-bits instructions (e.g UBFX).
Teach it to preserve debug locations and salvage debug values.
llvm-svn: 342319
|