| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
| |
Moving subprogram specific flags into DISPFlags makes IR code more readable.
In addition, we provide free space in DIFlags for other
'non-subprogram-specific' debug info flags.
Patch by Djordje Todorovic.
Differential Revision: https://reviews.llvm.org/D59288
llvm-svn: 356454
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce a DW_OP_LLVM_convert Dwarf expression pseudo op that allows
for a convenient way to perform type conversions on the Dwarf expression
stack. As an additional bonus it paves the way for using other Dwarf
v5 ops that need to reference a base_type.
The new DW_OP_LLVM_convert is used from lib/Transforms/Utils/Local.cpp
to perform sext/zext on debug values but mainly the patch is about
preparing terrain for adding other Dwarf v5 ops that need to reference a
base_type.
For Dwarf v5 the op maps to DW_OP_convert and for earlier versions a
complex shift & mask pattern is generated to emulate sext/zext.
This is a recommit of r356442 with trivial fixes for the failing tests.
Differential Revision: https://reviews.llvm.org/D56587
llvm-svn: 356451
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 1cf4b593a7ebd666fc6775f3bd38196e8e65fafe.
Build bots found failing tests not detected locally.
Failing Tests (3):
LLVM :: DebugInfo/Generic/convert-debugloc.ll
LLVM :: DebugInfo/Generic/convert-inlined.ll
LLVM :: DebugInfo/Generic/convert-linked.ll
llvm-svn: 356444
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce a DW_OP_LLVM_convert Dwarf expression pseudo op that allows
for a convenient way to perform type conversions on the Dwarf expression
stack. As an additional bonus it paves the way for using other Dwarf
v5 ops that need to reference a base_type.
The new DW_OP_LLVM_convert is used from lib/Transforms/Utils/Local.cpp
to perform sext/zext on debug values but mainly the patch is about
preparing terrain for adding other Dwarf v5 ops that need to reference a
base_type.
For Dwarf v5 the op maps to DW_OP_convert and for earlier versions a
complex shift & mask pattern is generated to emulate sext/zext.
Differential Revision: https://reviews.llvm.org/D56587
llvm-svn: 356442
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
- Make some class member methods const
- Delete unnecessary includes
- Use a simpler form of `BuildMI`
Reviewers: kripken
Subscribers: dschuff, sbc100, jgravelle-google, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59454
llvm-svn: 356440
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: tlively, sbc100
Subscribers: dschuff, jgravelle-google, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59469
llvm-svn: 356438
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Adds patterns to lower all the remaining setcc modes: lt, gt,
le, and ge. Fixes PR40912.
Reviewers: aheejin, sbc100, dschuff
Reviewed By: dschuff
Subscribers: jgravelle-google, hiraditya, sunfish, jdoerfert, llvm-commits, srj
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59519
llvm-svn: 356431
|
|
|
|
|
|
|
|
|
|
| |
computeConstantRange()"
This reverts commit 106f0cdefb02afc3064268dc7a71419b409ed2f3.
This change impacts the AMDGPU smed3.ll and umed3.ll codegen tests.
llvm-svn: 356424
|
|
|
|
|
|
| |
64-bit mode.
llvm-svn: 356419
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add support for min/max flavor selects in computeConstantRange(),
which allows us to fold comparisons of a min/max against a constant
in InstSimplify. This was suggested by spatel as an alternative
approach to D59378. I've also added the infinite looping test from
that revision here.
Differential Revision: https://reviews.llvm.org/D59506
llvm-svn: 356415
|
|
|
|
|
|
|
|
| |
extended 8-bit immediates.
We need to allow [128,255] in addition to [-128, 127] to match gas.
llvm-svn: 356413
|
|
|
|
|
|
| |
Forgot to add a change to relax some asserts in r356396.
llvm-svn: 356411
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
NFC.
The default implementation does we want and is going to more compatible
with dynamic linking (-fPIC) support that is planned.
This is NFC because currently we only build wasm with
`-relocation-model=static` which in turn means that the default
`isOffsetFoldingLegal` always returns true today.
Differential Revision: https://reviews.llvm.org/D54661
llvm-svn: 356410
|
|
|
|
|
|
|
|
|
|
|
| |
This is preparation for D59506. The InstructionSimplify abs handling
is moved into computeConstantRange(), which is the general place for
such calculations. This is NFC and doesn't affect the existing tests
in test/Transforms/InstSimplify/icmp-abs-nabs.ll.
Differential Revision: https://reviews.llvm.org/D59511
llvm-svn: 356409
|
|
|
|
|
|
| |
consistent with the ROR patterns.
llvm-svn: 356407
|
|
|
|
|
|
| |
For the i8, i16, and i32 instructions we were using a relocImm. Presumably we should for i64 as well.
llvm-svn: 356406
|
|
|
|
|
|
|
|
|
|
| |
Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59501
llvm-svn: 356405
|
|
|
|
|
|
|
|
| |
This change makes linking into .build-id atomic and safe to use.
Some users under particular workflows are reporting that this races
more than half the time under particular conditions.
llvm-svn: 356404
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Allow the clamp modifier on vop3 int arithmetic instructions in assembly
and disassembly.
This involved adding a clamp operand to the affected instructions in MIR
and MC, and thus having to fix up several places in codegen and MIR
tests.
Differential Revision: https://reviews.llvm.org/D59267
Change-Id: Ic7775105f02a985b668fa658a0cd7837846a534e
llvm-svn: 356399
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit allows v_cndmask_b32_e64 with abs, neg source
modifiers on src0, src1 to be assembled and disassembled.
This does appear to be allowed, even though they are floating point
modifiers and the operand type is b32.
To do this, I added src0_modifiers and src1_modifiers to the
MachineInstr, which involved fixing up several places in codegen and mir
tests.
Differential Revision: https://reviews.llvm.org/D59191
Change-Id: I69bf4a8c73ebc65744f6110bb8fc4e937d79fbea
llvm-svn: 356398
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
After review comments, it was preferred to not teach MachineIRBuilder about
non-generic instructions beyond using buildInstr().
For AArch64 I've changed the buildCopy() calls to buildInstr() + a
separate addReg() call.
This also relaxes the MachineIRBuilder's COPY checking more because it may
not always have a SrcOp given to it.
llvm-svn: 356396
|
|
|
|
|
|
|
|
|
|
|
| |
Before, empty debug streams were written as 8 bytes (4 bytes signature + 4 bytes for the GlobalRefs count).
With this patch, unused empty streams aren't emitted anymore. Modules now encode 65535 as an 'unused stream' value, by convention.
Also fix the * Linker * contrib section which wasn't correctly emitted previously.
Differential Revision: https://reviews.llvm.org/D59502
llvm-svn: 356395
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
checks bot
This fixes a couple of unflushed raw_string_ostream bugs in recent
commits that only show up on a bot building on windows with expensive
checks.
Differential Revision: https://reviews.llvm.org/D59396
Change-Id: I9c6208325503b3ee0786b4b688e13fc24a15babf
llvm-svn: 356394
|
|
|
|
|
|
| |
relocImm8_su/relocImm16_su/relocImm32_su/ to accurately reflect what they are.
llvm-svn: 356393
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reinstates r347934, along with a tweak to address a problem with
PHI node ordering that that commit created (or exposed). (That commit
was reverted at r348426, due to the PHI node issue.)
Original commit message:
r320789 suppressed moving the insertion point of SCEV expressions with
dev/rem operations to the loop header in non-loop-invariant situations.
This, and similar, hoisting is also unsafe in the loop-invariant case,
since there may be a guard against a zero denominator. This is an
adjustment to the fix of r320789 to suppress the movement even in the
loop-invariant case.
This fixes PR30806.
Differential Revision: https://reviews.llvm.org/D57428
llvm-svn: 356392
|
|
|
|
|
|
|
|
|
|
|
| |
It uses the generic AArch64_IMM::expandMOVImm to get the correct
number of instruction used in immediate materialization.
Reviewers: efriedma
Differential Revision: https://reviews.llvm.org/D58461
llvm-svn: 356391
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch follows some ideas from r352866 to optimize the floating
point materialization even further. It changes isFPImmLegal to
considere up to 2 mov instruction or up to 5 in case subtarget has
fused literals.
The rationale is the cost is the same for mov+fmov vs. adrp+ldr; but
the mov+fmov sequence is always better because of the reduced d-cache
pressure. The timings are still the same if you consider movw+movk+fmov
vs. adrp+ldr will be fused (although one instruction longer).
Reviewers: efriedma
Differential Revision: https://reviews.llvm.org/D58460
llvm-svn: 356390
|
|
|
|
|
|
|
|
|
|
|
| |
This allows better code size for aarch64 floating point materialization
in a future patch.
Reviewers: evandro
Differential Revision: https://reviews.llvm.org/D58690
llvm-svn: 356389
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It splits the login of actual instruction emission away from the logic
that figures out the appropriate sequence on AArch64ExpandPseudo::expandMOVImm.
The new function AArch64_IMM::expandMOVImm, which return the list of the
instructions to materialize the immediate constant, is implemented on a
separated unit because it will be used in a subsequent patch to optimize
floating point materialization.
Reviewers: efriedma
Differential Revision: https://reviews.llvm.org/D58915
llvm-svn: 356387
|
|
|
|
|
|
|
|
|
|
| |
custom printing and custom parsing to achieve the same result and more
Similar to previous change done for VPCOM and VPCMP
Differential Revision: https://reviews.llvm.org/D59468
llvm-svn: 356384
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Delete temporarily constructed node uses for analysis after it's use,
holding onto original input nodes. Ideally this would be rewritten
without making nodes, but this appears relatively complex.
Reviewers: spatel, RKSimon, craig.topper
Subscribers: jdoerfert, hiraditya, deadalnix, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D57921
llvm-svn: 356382
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add an experimental buffer fat pointer address space that is currently
unhandled in the backend. This commit reserves address space 7 as a
non-integral pointer repsenting the 160-bit fat pointer (128-bit buffer
descriptor + 32-bit offset) that is heavily used in graphics workloads
using the AMDGPU backend.
Differential Revision: https://reviews.llvm.org/D58957
llvm-svn: 356373
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
transforms
Follow-up to:
rL356338
rL356369
We can calculate an arbitrary vector constant minus the bitwidth, so there's
no need to limit this transform to scalars and splats.
llvm-svn: 356372
|
|
|
|
|
|
|
|
|
|
|
| |
Follow-up to:
rL356338
Rotates are a special case of funnel shift where the 2 input operands
are the same value, but that does not need to be a restriction for the
canonicalization when the shift amount is a constant.
llvm-svn: 356369
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Look past bitcasts when looking for parameter debug values that are
described by frame-index loads in `EmitFuncArgumentDbgValue()`.
In the attached test case we would be left with an undef `DBG_VALUE`
for the parameter without this patch.
A similar fix was done for parameters passed in registers in D13005.
This fixes PR40777.
Reviewers: aprantl, vsk, jmorse
Reviewed By: aprantl
Subscribers: bjope, javed.absar, jdoerfert, llvm-commits
Tags: #debug-info, #llvm
Differential Revision: https://reviews.llvm.org/D58831
llvm-svn: 356363
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes https://bugs.llvm.org/show_bug.cgi?id=35094
The Dead register definition pass should leave alone the atomicrmw
instructions on AArch64 (LTE extension). The reason is the following
statement in the Arm ARM:
"The ST<OP> instructions, and LD<OP> instructions where the destination
register is WZR or XZR, are not regarded as doing a read for the purpose
of a DMB LD barrier."
A good example was given in the gcc thread by Will Deacon (linked in the
bugzilla ticket 35094):
P0 (atomic_int* y,atomic_int* x) {
atomic_store_explicit(x,1,memory_order_relaxed);
atomic_thread_fence(memory_order_release);
atomic_store_explicit(y,1,memory_order_relaxed);
}
P1 (atomic_int* y,atomic_int* x) {
atomic_fetch_add_explicit(y,1,memory_order_relaxed); // STADD
atomic_thread_fence(memory_order_acquire);
int r0 = atomic_load_explicit(x,memory_order_relaxed);
}
P2 (atomic_int* y) {
int r1 = atomic_load_explicit(y,memory_order_relaxed);
}
My understanding is that it is forbidden for r0 == 0 and r1 == 2 after
this test has executed. However, if the relaxed add in P1 compiles to
STADD and the subsequent acquire fence is compiled as DMB LD, then we
don't have any ordering guarantees in P1 and the forbidden result could
be observed.
Change-Id: I419f9f9df947716932038e1100c18d10a96408d0
llvm-svn: 356360
|
|
|
|
| |
llvm-svn: 356359
|
|
|
|
|
|
|
|
|
| |
These are used to help convert OR->LEA when needed to avoid avoid a copy. They
aren't need after register allocation.
Happens to remove an ugly goto from X86MCCodeEmitter.cpp
llvm-svn: 356356
|
|
|
|
|
|
| |
All the other instructions are printed with a preceeding tab.
llvm-svn: 356355
|
|
|
|
|
|
|
|
|
|
|
|
| |
for all other printing functions.
The only thing the print methods currently need to know is the string to print for the memory size in intel syntax.
This patch merges the functions based on this string. If we ever need something else in the future, its easy to split them back out.
This reduces the number of cases in the assembly printers. It shrinks the intel printer to only use 7 bytes per instruction instead of 8.
llvm-svn: 356352
|
|
|
|
|
|
|
|
|
| |
AMDGPU would like to use these MVTs.
Differential Revision: https://reviews.llvm.org/D58901
Change-Id: I6125fea810d7cc62a4b4de3d9904255a1233ae4e
llvm-svn: 356351
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This
commit does not add them, but makes preparatory changes:
* Exclude non-legal non-power-of-2 vector types from ComputeRegisterProp
mechanism in TargetLoweringBase::getTypeConversion.
* Cope with SETCC and VSELECT for odd-width i1 vector when the other
vectors are legal type.
Some of this patch is from Matt Arsenault, also of AMD.
Differential Revision: https://reviews.llvm.org/D58899
Change-Id: Ib5f23377dbef511be3a936211a0b9f94e46331f8
llvm-svn: 356350
|
|
|
|
|
|
|
| |
Fix up rL356335 by checking that CPSR is not read between
the compare and the branch.
llvm-svn: 356349
|
|
|
|
| |
llvm-svn: 356348
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are a few different issues, mostly stemming from using
generation based checks for anything instead of subtarget
features. Stop adding flat-address-space as a feature for HSA, as it
should only be a device property. This was incorrectly allowing flat
instructions to select for SI.
Increase the default generation for HSA to avoid the encoding error
when emitting objects. This has some other side effects from various
checks which probably should be separate subtarget features (in the
cost model and for dealing with the DS offset folding issue).
Partial fix for bug 41070. It should probably be an error to try using
amdhsa without flat support.
llvm-svn: 356347
|
|
|
|
|
|
| |
Following the suggestion in D59475.
llvm-svn: 356346
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the same change as rL356290, but for signed add. It replaces
the existing ripple logic with the overflow logic in ConstantRange.
This is NFC in that it should return NeverOverflow in exactly the
same cases as the previous implementation. However, it does make
computeOverflowForSignedAdd() more powerful by now also determining
AlwaysOverflows conditions. As none of its consumers handle this yet,
this has no impact on optimization. Making use of AlwaysOverflows
in with.overflow folding will be handled as a followup.
Differential Revision: https://reviews.llvm.org/D59450
llvm-svn: 356345
|
|
|
|
|
|
|
|
|
|
| |
of custom printing and custom parsing to achieve the same result and more
Similar to the previous patch for VPCOM.
Differential Revision: https://reviews.llvm.org/D59398
llvm-svn: 356344
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
custom printing and custom parsing to achieve the same result and more
Previously we had a regular form of the instruction used when the immediate was 0-7. And _alt form that allowed the full 8 bit immediate. Codegen would always use the 0-7 form since the immediate was always checked to be in range. Assembly parsing would use the 0-7 form when a mnemonic like vpcomtrueb was used. If the immediate was specified directly the _alt form was used. The disassembler would prefer to use the 0-7 form instruction when the immediate was in range and the _alt form otherwise. This way disassembly would print the most readable form when possible.
The assembly parsing for things like vpcomtrueb relied on splitting the mnemonic into 3 pieces. A "vpcom" prefix, an immediate representing the "true", and a suffix of "b". The tablegenerated printing code would similarly print a "vpcom" prefix, decode the immediate into a string, and then print "b".
The _alt form on the other hand parsed and printed like any other instruction with no specialness.
With this patch we drop to one form and solve the disassembly printing issue by doing custom printing when the immediate is 0-7. The parsing code has been tweaked to turn "vpcomtrueb" into "vpcomb" and then the immediate for the "true" is inserted either before or after the other operands depending on at&t or intel syntax.
I'd rather not do the custom printing, but I tried using an InstAlias for each possible mnemonic for all 8 immediates for all 16 combinations of element size, signedness, and memory/register. The code emitted into printAliasInstr ended up checking the number of operands, the register class of each operand, and the immediate for all 256 aliases. This was repeated for both the at&t and intel printer. Despite a lot of common checks between all of the aliases, when compiled with clang at least this commonality was not well optimized. Nor do all the checks seem necessary. Since I want to do a similar thing for vcmpps/pd/ss/sd which have 32 immediate values and 3 encoding flavors, 3 register sizes, etc. This didn't seem to scale well for clang binary size. So custom printing seemed a better trade off.
I also considered just using the InstAlias for the matching and not the printing. But that seemed like it would add a lot of extra rows to the matcher table. Especially given that the 32 immediates for vpcmpps have 46 strings associated with them.
Differential Revision: https://reviews.llvm.org/D59398
llvm-svn: 356343
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This
commit does not add them, but makes preparatory changes:
* Fixed assumptions of power-of-2 vector type in kernel arg handling,
and added v5 kernel arg tests and v3/v5 shader arg tests.
* Added v5 tests for cost analysis.
* Added vec3/vec5 arg test cases.
Some of this patch is from Matt Arsenault, also of AMD.
Differential Revision: https://reviews.llvm.org/D58928
Change-Id: I7279d6b4841464d2080eb255ef3c589e268eabcd
llvm-svn: 356342
|