| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
| |
It's okay to *not* copy the trailing zero of a windows section/symbol name.
This is compatible with strncpy behavior but gcc doesn't know that and
throws an invalid warning. Encode this behavior in a proper function.
Differential Revision: https://reviews.llvm.org/D66420
llvm-svn: 369501
|
| |
|
|
| |
llvm-svn: 369490
|
| |
|
|
|
|
|
|
| |
We do this by merging the source with the high bits set to 0.
Differential Revision: https://reviews.llvm.org/D66181
llvm-svn: 369480
|
| |
|
|
|
|
|
|
| |
Removed code that added program code csects to a collection as part
of addressing review comments, but I failed to update an assert affected
by the change before commiting.
llvm-svn: 369471
|
| |
|
|
|
|
|
|
|
|
|
|
| |
For an internal function, if all its call sites are dead, the body of the function is considered dead.
Reviewers: jdoerfert, uenoku
Subscribers: hiraditya, llvm-commits
Differential Revision: https://reviews.llvm.org/D66155
llvm-svn: 369470
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D66503
llvm-svn: 369468
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Add an initial GlobalISel skeleton for RISCV. It can only run ir translator for `ret void`.
Patch by Andrew Wei
Reviewers: asb, sabuasal, apazos, lenary, simoncook, lewis-revill, edward-jones, rogfer01, xiangzhai, rovka, Petar.Avramovic, mgorny, dsanders
Reviewed By: dsanders
Subscribers: pzheng, s.egerton, dsanders, hiraditya, rbar, johnrusso, niosHD, kito-cheng, shiva0217, jrtc27, MaskRay, zzheng, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, psnobl, benna, Jim, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65219
llvm-svn: 369467
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Make Phi cleanups consistent: remove self as a trivial Phi and
recurse to potentially remove other trivial phis.
Reviewers: george.burgess.iv
Subscribers: Prazek, sanjoy.google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66454
llvm-svn: 369466
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a GlobalISel equivalent for the logical_imm32_XFORM and logical_imm64_XFORM
SDNodeXForms in AArch64InstrFormats.td.
- Add select-logical-imm.mir, which contains tests for each imported pattern.
- Update select-pr32733.mir and select-scalar-shift-imm.mir, since they now
select instructions of this form.
Differential Revision: https://reviews.llvm.org/D66162
llvm-svn: 369465
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When inserting a new Def, and inserting Phis in the IDF when needed,
also mark the already existing Phis in the IDF as non-optimized, since
these may need fixing as well.
In the test attached, there is a Phi in the IDF that happens to be
trivial, and is wrongfully removed by the call to getLastDef that
follows. This is a valid situation and the existing IDF Phis need to
marked as "may need fixing" as well.
Resolves PR43044.
Reviewers: george.burgess.iv
Subscribers: Prazek, sanjoy.google, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66495
llvm-svn: 369464
|
| |
|
|
|
|
|
| |
Remove assert of 'Sec->getCSectType() <= 0x07u' added in r369454, since its
always true.
llvm-svn: 369462
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds GlobalISel equivalents for the following from AArch64InstrFormats:
- arith_shifted_reg32
- arith_shifted_reg64
And partial support for
- logical_shifted_reg32
- logical_shifted_reg32
The only thing missing for the logical cases is support for rotates. Other than
the missing support, the transformation is identical for the arithmetic shifted
register and the logical shifted register.
Lots of tests here:
- Add select-arith-shifted-reg.mir to show that we correctly select add and
sub instructions which use this pattern.
- Add select-logical-shifted-reg.mir to cover patterns which are not shared
between the arithmetic and logical cases.
- Update addsub-shifted.ll to show that we correctly fold shifts into
adds/subs.
- Update eon.ll to show that we can select the eon instruction by folding xors.
Differential Revision: https://reviews.llvm.org/D66163
llvm-svn: 369460
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
(concat_vectors X, Y), undef)) -> (concat_vectors X, Y, undef, undef)
I also had to add a new combine to X86's combineExtractSubvector to prevent a regression.
This helps our vXi1 code see the full concat operation and allow it optimize undef to a zero if there is already a zero in the concat. This helped us use a movzx instead of an AND in some of the tests. In those tests, one concat comes from SelectionDAGBuilder and the second comes from type legalization of v4i1->i4 bitcasts which uses an additional concat. Though these changes weren't my original motivation.
I'm looking at making X86ISelLowering's narrowShuffle emit a concat_vectors instead of an insert_subvector since concat_vectors is more canonical during early DAG combine. This patch helps prevent a regression from my experiments with that.
Differential Revision: https://reviews.llvm.org/D66456
llvm-svn: 369459
|
| |
|
|
|
|
|
|
| |
This reverts r367088 (git commit 9ad565f70ec5fd3531056d7c939302d4ea970c83)
And the follow up fix r368631 / e9865b9b31bb2e6bc742dc6fca8f9f9517c3c43e
llvm-svn: 369457
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds Wrapper classes for MCSymbol and MCSection into the XCOFF target
object writer. Also adds a class to represent the top-level sections, which we
materialize in the ObjectWriter.
executePostLayoutBinding will map all csects into the appropriate
container depending on its storage mapping class, and map all symbols
into their containing csect. Once all symbols have been processed we
- Assign addresses and symbol table indices.
- Calaculte section sizes.
- Build the section header table.
- Assign the sections raw-pointer value for non-virtual sections.
Since the .bss section is virtual, writing the header table is enough to
add support. Writing of a sections raw data, or of any relocations is
not included in this patch.
Testing is done by dumping the section header table, but it needs to be
extended to include dumping the symbol table once readobj support for
dumping auxiallary entries lands.
Differential Revision: https://reviews.llvm.org/D65159
llvm-svn: 369454
|
| |
|
|
| |
llvm-svn: 369444
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
StringMap is used for storing call target to frequency map for AutoFDO. However the iterating order of StringMap is non-deterministic, which leads to non-determinism in AutoFDO profile output. Now new API getSortedCallTargets and SortCallTargets are added for deterministic ordering and output.
Roundtrip test for text profile and binary profile is added.
Reviewers: wmi, davidxl, danielcdh
Subscribers: hiraditya, mgrang, llvm-commits, twoh
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66191
llvm-svn: 369440
|
| |
|
|
|
|
|
|
|
|
| |
(v16i1 X), 0)))) -> (i8 (trunc (i16 (bitcast (v16i1 X))))) on KNL target
Without AVX512DQ we don't have KMOVB so we can't really copy 8-bits of a k-register to a GPR. We have to copy 16 bits instead. We do this even if the DAG copy is from v8i1->v16i1. If we detect the (i8 (bitcast (v8i1 (extract_subvector (v16i1 X), 0)))) we should rewrite the types to match the copy we do support. By doing this, we can help known bits to propagate without losing the upper 8 bits of the input to the extract_subvector. This allows some zero extends to be removed since we have an isel pattern to use kmovw for (zero_extend (i16 (bitcast (v16i1 X))).
Differential Revision: https://reviews.llvm.org/D66489
llvm-svn: 369434
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
KMOVW and a SUBREG_TO_REG. Similar for i8 and anyextend.
We already had patterns for extending to i32 to take advantage of
the impliciting zeroing of the upper bits of a 32-bit GPR that is
done by KMOVW/KMOVB. But the extend might be all the way to i64,
in which case the existing patterns would fail and we'd get a
KMOVW/B followed by a MOVZX. By adding patterns for i64 we can
use the fact that KMOVW/B zero the upper bits of the 32-bit GPR
and the normal property that 32-bit GPR writes implicitly zero the
upper 32-bits of the full 64-bit GPR.
The anyextend patterns are slightly different since we don't care
about the upper zeros. For the i8->i64 I think this avoids selecting
the anyextend as a MOVZX to prevent a partial register issue that
doesn't exist. For i16->i64 I think we would have just emitted an
insert_subreg on top of the extract_subreg that the vXi16->i16
bitcast pattern emits. The register coalescer or peephole pass
should combine those, but this saves that work and makes i8/16
consistent.
llvm-svn: 369431
|
| |
|
|
|
|
|
|
| |
This avoids spurious relocation types for windows/elf targets.
Differential Revision: https://reviews.llvm.org/D66401
llvm-svn: 369426
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Debug sections are special in that they can contain relocations against
symbols that are not present in the final output (i.e. not live).
However it is also possible to have R_WASM_TABLE_INDEX relocations
against symbols that don't have a table index assigned (since they are
not address taken by actual code.
Fixes: https://github.com/emscripten-core/emscripten/issues/9023
Differential Revision: https://reviews.llvm.org/D66435
llvm-svn: 369423
|
| |
|
|
| |
llvm-svn: 369421
|
| |
|
|
|
|
|
| |
This reverts r367500 and r369203. This is causing various test
failures.
llvm-svn: 369417
|
| |
|
|
|
|
| |
This is a follow-up of r369365.
llvm-svn: 369412
|
| |
|
|
|
|
|
| |
We were creating 2 instructions and relying on a subsequent fold
to invert a not(icmp). Create the final icmp directly instead.
llvm-svn: 369411
|
| |
|
|
| |
llvm-svn: 369410
|
| |
|
|
|
|
|
|
| |
This patch adds vaddva selection.
Differential revision: https://reviews.llvm.org/D66410
llvm-svn: 369404
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
https://reviews.llvm.org/D66077
The value passed into dbg.value may relate to multiple registers,
each of which need a DBG_VALUE.
This fix calls MIRBuilder.buildDirectDbgValue for each register.
Without this, IR passed in from flang-compiler/flang may fail an
assertion in getOrCreateVReg.
Patch by : peterwaller-arm.
llvm-svn: 369403
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
For targets requiring aggressive scheduling and/or software pipeline we need to
apply predication before preRA scheduling. This adds a pass re-using the early
if-cvt infrastructure but generating predicated instructions instead of
speculatively executing instructions. It allows doing if conversion on blocks
containing instructions with side-effects. The pass re-use the target hook from
postRA if-conversion to let the target decide on the heuristic to apply.
Differential Revision: https://reviews.llvm.org/D66190
llvm-svn: 369395
|
| |
|
|
|
|
|
|
| |
1. Update function name and stale code comments.
2. Use variable names that are less ambiguous.
3. Move operand checks into the function as early exits.
llvm-svn: 369390
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When the line format is wrong, we may end up accessing out of bound
memory. eg: the test with invalide line will cause assert.
Assertion `idx < size()' failed
The fix is to report fatal when we found mismatched line format.
Reviewers: qcolombet, volkan
Reviewed By: qcolombet
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66444
llvm-svn: 369389
|
| |
|
|
|
|
|
|
|
| |
Latency and throughput of LOCK INC/DEC/NEG/NOT is always 19cy.
Number of uOPs is still 1.
Differential Revision: https://reviews.llvm.org/D66469
llvm-svn: 369388
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the original integer variant requested in:
https://bugs.llvm.org/show_bug.cgi?id=35607
As noted in the TODO and several similar TODOs around this block,
we could do this in instsimplify, but then it would cost more
because we would be trying to match min/max via ValueTracking
in 2 different places.
There are 4 commuted variants for each of smin/smax/umin/umax
that are not matched here. There are also icmp predicate variants
that are not included in the affected test file because they are
already handled by instsimplify by folding the final icmp to
true/false.
https://rise4fun.com/Alive/3KVc
Name: smax(smax, smin)
%c1 = icmp slt i32 %x, %y
%c2 = icmp slt i32 %y, %x
%min = select i1 %c1, i32 %x, i32 %y
%max = select i1 %c2, i32 %x, i32 %y
%c3 = icmp sgt i32 %max, %min
%r = select i1 %c3, i32 %max, i32 %min
=>
%r = %max
Name: smin(smax, smin)
%c1 = icmp slt i32 %x, %y
%c2 = icmp slt i32 %y, %x
%min = select i1 %c1, i32 %x, i32 %y
%max = select i1 %c2, i32 %x, i32 %y
%c3 = icmp sgt i32 %max, %min
%r = select i1 %c3, i32 %min, i32 %max
=>
%r = %min
Name: umax(umax, umin)
%c1 = icmp ult i32 %x, %y
%c2 = icmp ult i32 %y, %x
%min = select i1 %c1, i32 %x, i32 %y
%max = select i1 %c2, i32 %x, i32 %y
%c3 = icmp ult i32 %min, %max
%r = select i1 %c3, i32 %max, i32 %min
=>
%r = %max
Name: umin(umax, umin)
%c1 = icmp ult i32 %x, %y
%c2 = icmp ult i32 %y, %x
%min = select i1 %c1, i32 %x, i32 %y
%max = select i1 %c2, i32 %x, i32 %y
%c3 = icmp ult i32 %min, %max
%r = select i1 %c3, i32 %min, i32 %max
=>
%r = %min
llvm-svn: 369386
|
| |
|
|
|
|
|
|
|
|
| |
The type_offset field is 8 bytes long in DWARF64. The patch extends
TypeOffset to uint64_t and fixes its reading. The patch also fixes
checking of TypeOffset bounds as it was inaccurate in DWARF64 case.
Differential Revision: https://reviews.llvm.org/D66465
llvm-svn: 369378
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
the FDE location
Follow binutils in using RISCV_32_PCREL for the FDE initial location. As
explained in the relevant binutils commit
<https://github.com/riscv/riscv-binutils-gdb/commit/a6cbf936e3dce68114d28cdf60d510a3f78a6d40>,
the ADD/SUB pair of relocations is problematic in the presence of linker
relaxation.
This patch has the same end goal as D64715 but includes test changes and
avoids adding a new global VariantKind to MCExpr.h (preferring
RISCVMCExpr VKs like the rest of the RISC-V backend).
Differential Revision: https://reviews.llvm.org/D66419
llvm-svn: 369375
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This recommits r368977, which was reverted in r369027 due to test
failures in lldb. The cause of this was different behavior of
readNativeFileSlice on windows and unix. These have been addressed in
r369269.
The original commit message was:
In case the function was called with a desired read size *and* the file
was not an "mmap()" candidate, the function was falling back to a
"pread()", but it was failing to check the result of that system call.
This meant that the function would return "success" even though the read
operation failed, and it returned a buffer full of uninitialized memory.
Reviewers: rnk, dblaikie
Subscribers: kristina, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66224
llvm-svn: 369370
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Jaguar, CMPXCHG has a latency of 11cy, and a maximum throughput of 0.33 IPC.
Throughput is superiorly limited to 0.33 because of the implicit in/out
dependency on register EAX. In the case of repeated non-atomic CMPXCHG with the
same memory location, store-to-load forwarding occurs and values for sequent
loads are quickly forwarded from the store buffer.
Interestingly, the functionality in LLVM that computes the reciprocal throughput
doesn't seem to know about RMW instructions. That functionality only looks at
the "consumed resource cycles" for the throughput computation. It should be
fixed/improved by a future patch. In particular, for RMW instructions, that
logic should also take into account for the write latency of in/out register
operands.
An atomic CMPXCHG has a latency of ~17cy. Throughput is also limited to
~17cy/inst due to cache locking, which prevents other memory uOPs to start
executing before the "lock releasing" store uOP.
CMPXCHG8rr and CMPXCHG8rm are treated specially because they decode to one less
macro opcode. Their latency tend to be the same as the other RR/RM variants. RR
variants are relatively fast 3cy (but still microcoded - 5 macro opcodes).
CMPXCHG8B is 11cy and unfortunately doesn't seem to benefit from store-to-load
forwarding. That means, throughput is clearly limited by the in/out dependency
on GPR registers. The uOP composition is sadly unknown (due to the lack of PMCs
for the Integer pipes). I have reused the same mix of consumed resource from the
other CMPXCHG instructions for CMPXCHG8B too.
LOCK CMPXCHG8B is instead 18cycles.
CMPXCHG16B is 32cycles. Up to 38cycles when the LOCK prefix is specified. Due to
the in/out dependencies, throughput is limited to 1 instruction every 32 (or 38)
cycles dependeing on whether the LOCK prefix is specified or not.
I wouldn't be surprised if the microcode for CMPXCHG16B is similar to 2x
microcode from CMPXCHG8B. So, I have speculatively set the JALU01 consumption to
2x the resource cycles used for CMPXCHG8B.
The two new hasLockPrefix() functions are used by the btver2 scheduling model
check if a MCInst/MachineInst has a LOCK prefix. Calls to hasLockPrefix() have
been encoded in predicates of variant scheduling classes that describe lat/thr
of CMPXCHG.
Differential Revision: https://reviews.llvm.org/D66424
llvm-svn: 369365
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: alexshap, jhenderson, rupprecht
Reviewed By: alexshap, jhenderson
Subscribers: abrachet, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65799
llvm-svn: 369348
|
| |
|
|
|
|
|
|
|
|
| |
The hack dated back to 2010 (r121076) and was documented by r122144:
// FIXME: The use if InSet = Addrs is a hack. Setting InSet causes us
// absolutize differences across sections and that is what the MachO writer
// uses Addrs for.
llvm-svn: 369337
|
| |
|
|
|
|
| |
after r369331
llvm-svn: 369334
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
line flag and all associated code, but leave it enabled by default
Google is reporting performance issues with the new default behavior
and have asked for a way to switch back to the old behavior while we
investigate and make fixes.
I've restored all of the code that had since been removed and added
additional checks of the command flag onto code paths that are
not otherwise guarded by a check of getTypeAction.
I've also modified the cost model tables to hopefully get us back
to the previous costs.
Hopefully we won't need to support this for very long since we
have no test coverage of the old behavior so we can very easily
break it.
llvm-svn: 369332
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before, we create the set of abstract attributes initially and then
dealt with the fact hat a lookup could fail, e.g., return a nullptr.
This patch will ensure we always return a valid object from a lookup,
allowing us not only to remove the nullptr checks but also to grow the
set of abstract attributes "in-flight" on-demand.
One can now start from those that have the best chance of improving
performance without the need to specify all they might depend on.
While this introduces some boilerplate, the usage of attributes is much
easier and cleaner now.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66276
llvm-svn: 369331
|
| |
|
|
| |
llvm-svn: 369330
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This is analogous to D66128 but for AADereferenceable. We have the logic
concentrated in the floating value updateImpl and we use the combiner
helper classes for arguments and return values.
The regressions will go away with "on-demand" attribute creation.
Improvements are already visible in the existing tests.
Reviewers: uenoku, sstefan1
Subscribers: hiraditya, bollu, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66272
llvm-svn: 369329
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
What D66126 did for AAAlign, this patch does for AANonNull. Agian, the
logic becomes more concise and localized. Again, returned poiners are
not annotated properly but that will not be an issue if this lands with
the "on-demand" generation of attributes. First improvements due to the
genericValueTraversal are already visible.
Reviewers: sstefan1, uenoku
Subscribers: hiraditya, bollu, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D66128
llvm-svn: 369328
|
| |
|
|
|
|
|
|
| |
The clamp operator should not take the known of the given state as the
known is potentially based on assumed information. This also adds TODOs
to guide improvements.
llvm-svn: 369327
|
| |
|
|
| |
llvm-svn: 369326
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Overriders may want to modify state in it. AMDGPU wants
to, but has to make its members mutable in order to do so.
Besides, EmitBasicBlockEnd is not const, so why should
Start be?
Patch by Bevin Hansson.
Reviewed By: nickdesaulniers
Differential Revision: https://reviews.llvm.org/D66341
llvm-svn: 369325
|
| |
|
|
|
|
| |
dump() declarations
llvm-svn: 369324
|
| |
|
|
|
|
| |
builds after r369317
llvm-svn: 369318
|