| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
| |
First of a number of commits to remove x86 schedule itineraries entirely - approved off-line with @craig.topper
llvm-svn: 329893
|
| |
|
|
|
|
|
| |
This is better than listing FPd 30 times :-)
Review: Ulrich Weigand
llvm-svn: 329887
|
| |
|
|
|
|
|
|
| |
This is NFC, even though it caught just a few cases of overlapping regular
expressions.
Review: Ulrich Weigand
llvm-svn: 329886
|
| |
|
|
|
|
|
| |
Since the common code getWeakLeft() is now available, there should not
be a local copy of this function in target.
llvm-svn: 329885
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch makes tryCandidate() virtual and some utility functions like
tryLess(), tryGreater(), ... externally available (used to be static).
This makes it possible for a target to derive a new MachineSchedStrategy from
GenericScheduler and reuse most parts.
It was necessary to wrap functions with the same names in
AMDGPU/SIMachineScheduler in a local namespace.
Review: Andy Trick, Florian Hahn
https://reviews.llvm.org/D43329
llvm-svn: 329884
|
| |
|
|
|
|
|
|
| |
0 when promoting shift result type. NFC
Operand 0 should have the same type of the result. So if the result type needs to be promoted, operand 0 needs to be promoted unconditionally.
llvm-svn: 329883
|
| |
|
|
|
|
| |
"is is" -> "is", "if if" -> "if", "or or" -> "or"
llvm-svn: 329878
|
| |
|
|
|
|
|
|
| |
Also add double-prevoius-failure.ll which captures a test case that at one
point triggered a compiler crash, while developing calling convention support
for f64 on RV32D with soft-float ABI.
llvm-svn: 329877
|
| |
|
|
|
|
|
| |
This also includes support and a test for truncating stores, which are now
possible thanks to the fpround pattern.
llvm-svn: 329876
|
| |
|
|
| |
llvm-svn: 329874
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
conv
fadd.d is required in order to force floating point registers to be used in
test code, as parameters are passed in integer registers in the soft float
ABI.
Much of this patch is concerned with support for passing f64 on RV32D with a
soft-float ABI. Similar to Mips, introduce pseudoinstructions to build an f64
out of a pair of i32 and to split an f64 to a pair of i32. BUILD_PAIR and
EXTRACT_ELEMENT can't be used, as a BITCAST to i64 would be necessary, but i64
is not a legal type.
llvm-svn: 329871
|
| |
|
|
| |
llvm-svn: 329870
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We're already removing allocsize attributes from Functions that we
remove args from, since removing arguments from a function may make the
allocsize attribute incorrect. It appears we forgot to also remove them
from callsites.
Without this, I get verifier errors on `@Test2`.
It probably wouldn't be too hard to make DAE properly update allocsize
attributes instead of dropping them, but I can't think of a scenario
where that'd be useful in practice.
llvm-svn: 329868
|
| |
|
|
|
|
| |
This reapplies commit r329644.
llvm-svn: 329865
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
order of evaluation.
The standard says that the order of evaluation of an expression
s[x] = foo()
is unspecified. In our case, we first create an empty entry in the map,
then call foo(), then store its return value to the created entry. The
problem is that foo uses the map as a cache, so if it finds that there
is an entry in the map, it stops computation. This change explicitly
sets the order, thus fixing this heisenbug.
llvm-svn: 329864
|
| |
|
|
| |
llvm-svn: 329862
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: This patch adds anchor() for MemoryBuffer, raw_fd_ostream, RTDyldMemoryManager, SectionMemoryManager, etc.
Reviewers: jlebar, eli.friedman, dblaikie
Reviewed By: dblaikie
Subscribers: mehdi_amini, mgorny, dblaikie, weimingz, llvm-commits
Differential Revision: https://reviews.llvm.org/D45244
llvm-svn: 329861
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Without these functions it's hard to create a TargetMachine for
Orc JIT that creates efficient native code.
It's not sufficient to just expose LLVMGetHostCPUName(), because
for some CPUs there's fewer features actually available than
the CPU name indicates (e.g. AVX might be missing on some CPUs
identified as Skylake).
Differential Revision: https://reviews.llvm.org/D44861
llvm-svn: 329856
|
| |
|
|
|
|
|
|
|
|
| |
This patch fixes https://bugs.llvm.org/show_bug.cgi?id=37039
The condition only covers one of the two 64-bit rotate instructions. This just
adds the second (RLDICLo).
Patch by Josh Stone.
llvm-svn: 329852
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently, an invalid asm insn, either in an asm file or
in an inline asm format, might be silently dropped. This patch
fixed two places where this may happen by
signaling the error so user knows what goes wrong.
The following is an example to demonstrate error messages:
-bash-4.2$ cat t.c
int test(void *ctx) {
#if defined(NO_ERROR)
asm volatile("r0 = *(u16 *)skb[%0]" : : "i"(2));
#elif defined(ERROR_1)
asm volatile("r20 = *(u16 *)skb[%0]" : : "i"(2));
#elif defined(ERROR_2)
asm volatile("r0 = *(u16 *)(r1 + ?)" : :);
#endif
return 0;
}
-bash-4.2$ cat run.sh
for macro in NO_ERROR ERROR_1 ERROR_2; do
echo "===== compile for macro" $macro
clang -D${macro} -O2 -target bpf -emit-llvm -S t.c
echo "==llc=="
llc -march=bpf -filetype=obj t.ll
done
-bash-4.2$ ./run.sh
===== compile for macro NO_ERROR
==llc==
===== compile for macro ERROR_1
==llc==
<inline asm>:1:2: error: invalid register/token name
r20 = *(u16 *)skb[2]
^
note: !srcloc = 135
===== compile for macro ERROR_2
==llc==
<inline asm>:1:21: error: unexpected token
r0 = *(u16 *)(r1 + ?)
^
note: !srcloc = 210
-bash-4.2$
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
llvm-svn: 329849
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Similar to the wbinvd instruction, except this
one does not invalidate caches. Ring 0 only.
The encoding matches a wbinvd instruction with
an F3 prefix.
Reviewers: craig.topper, zvi, ashlykov
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D43816
llvm-svn: 329847
|
| |
|
|
|
|
|
|
|
| |
Most importantly, we should not replace slashes with backslashes
because that would invalidate the path.
Differential Revision: https://reviews.llvm.org/D45473
llvm-svn: 329838
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Atom is the only x86 target that still uses schedule itineraries, if we can remove this then we can begin the work on removing x86 itineraries. I've also found that it will help with PR36550.
I've focussed on matching the existing model as closely as possible (relying on the schedule tests), PR36895 indicated a lot of these were incorrect but we can just as easily fix these after this patch as before. Hopefully we can get llvm-exegesis to help here,
There are a few instructions that rely on itinerary scheduling (mainly push/pop/return) of multiple resource stages, but I don't think any of these are show stoppers.
There are also a few codegen changes that seem related to the post-ra scheduler acting a little differently, I haven't tracked these down but they don't seem critical.
NOTE: I don't have access to any Atom hardware, so this hasn't been tested in the wild.
Differential Revision: https://reviews.llvm.org/D45486
llvm-svn: 329837
|
| |
|
|
|
|
|
|
|
|
| |
Pre-commit for D45486, don't rely on itinerary scheduler model to determine latencies for padding, use the generic TargetSchedModel::computeInstrLatency call.
Also, replace hard coded (atom specific) 2*uop creation per padding cycle with a version based on the scheduler model's issue width.
Differential Revision: https://reviews.llvm.org/D45486
llvm-svn: 329834
|
| |
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D45061
llvm-svn: 329830
|
| |
|
|
|
|
|
|
|
|
| |
When NVPTX TARGET_BUILTIN specifies sm_XX or ptxYY as required feature,
consider those features available if we're compiling for GPU >= sm_XX or have
enabled PTX version >= ptxYY.
Differential Revision: https://reviews.llvm.org/D45061
llvm-svn: 329829
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This fixes the number of SGPRs and VGPRs in the *_RSRC1 register to
allow for registers set up in wave dispatch, even if those registers are
not used in the shader.
Re-landed after noticing that the buildbot failure from 329808 seemed to
be unrelated.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D45503
Change-Id: I6575f0e0d2a528d1319d0b289f0ebe4510fa5771
llvm-svn: 329826
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is causing compilation timeouts on code with long sequences of
local values and calls (i.e. foo(1); foo(2); foo(3); ...). It turns out
that code coverage instrumentation is a great way to create sequences
like this, which how our users ran into the issue in practice.
Intel has a tool that detects these kinds of non-linear compile time
issues, and Andy Kaylor reported it as PR37010.
The current sinking code scans the whole basic block once per local
value sink, which happens before emitting each call. In theory, local
values should only be introduced to be used by instructions between the
current flush point and the last flush point, so we should only need to
scan those instructions.
llvm-svn: 329822
|
| |
|
|
| |
llvm-svn: 329821
|
| |
|
|
|
|
|
|
|
|
| |
Previously the MD5 option of the .file directive provided the checksum
as a quoted hex string; now it's a normal hex number with 0x prefix,
same as the .octa directive accepts.
Differential Revision: https://reviews.llvm.org/D45459
llvm-svn: 329820
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Add the minimal support necessary to lower a function that returns the
sum of two i32 values.
Support argument/return lowering of i32 values through registers only.
Add tablegen for regbankselect and instructionselect.
Patch by Petar Avramovic.
Differential Revision: https://reviews.llvm.org/D44304
llvm-svn: 329819
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Two issues were fixed:
runtime has difficulty to allocate memory for an external symbol of a
kernel and set the address of the external symbol, therefore make the runtime
handle of an enqueued kernel an ordinary global variable. Runtime only needs
to store the address of the loaded kernel to the handle and has verified
that this approach works.
handle the situation where __enqueue_kernel* gets inlined therefore
the enqueued kernel may be used through a constant expr instead
of an instruction.
Differential Revision: https://reviews.llvm.org/D45187
llvm-svn: 329815
|
| |
|
|
|
|
|
|
|
| |
This reverts 329808. That change caused a report of a failure in
test/CodeGen/MIR/AMDGPU/mir-canon-multi.mir that I didn't see. I suspect
it is an expensive-check-only error.
Change-Id: I8133f26f15e7d5ec2b09c687c12cd70e918461b0
llvm-svn: 329811
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Place parsing of a vector index into a separate function to reduce
duplication, since the code is duplicated in both the parsing of a
Neon vector register operand and a Neon vector list.
This is patch [2/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro
Reviewed By: rengolin
Subscribers: kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D45428
llvm-svn: 329809
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This fixes the number of SGPRs and VGPRs in the *_RSRC1 register to
allow for registers set up in wave dispatch, even if those registers are
not used in the shader.
Subscribers: arsenm, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D45503
Change-Id: I6575f0e0d2a528d1319d0b289f0ebe4510fa5771
llvm-svn: 329808
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Split variable index shuffles from immediate index shuffles
WriteFVarShuffle - variable 'in-lane' shuffles (VPERMILPS/VPERMIL2PS etc.)
WriteVarShuffle - variable 'in-lane' shuffles (PSHUFB/VPPERM etc.)
WriteFVarShuffle256 - variable 'cross-lane' shuffles (VPERMPS etc.)
WriteVarShuffle256 - variable 'cross-lane' shuffles (VPERMD etc.)
Differential Revision: https://reviews.llvm.org/D45404
llvm-svn: 329806
|
| |
|
|
|
|
|
|
|
| |
See bug 36845: https://bugs.llvm.org/show_bug.cgi?id=36845
Differential Revision: https://reviews.llvm.org/D45443
Reviewers: artem.tamazov, arsenm, timcorringham
llvm-svn: 329801
|
| |
|
|
|
|
|
|
|
|
|
|
| |
In r329691, we would choose FP even if the offset wouldn't fit, just
because the offset is smaller than the one from BP. This made many
accesses through FP need to scavenge a register, which resulted in
slower and bigger code for no good reason.
This patch now always picks the offset that fits first, even if FP is
preferred.
llvm-svn: 329797
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Bitwise 'not' of the min/max could be eliminated in the pattern:
%notx = xor i32 %x, -1
%cmp1 = icmp sgt[slt/ugt/ult] i32 %notx, %y
%smax = select i1 %cmp1, i32 %notx, i32 %y
%res = xor i32 %smax, -1
https://rise4fun.com/Alive/lCN
Reviewers: spatel
Reviewed by: spatel
Subscribers: a.elovikov, llvm-commits
Differential Revision: https://reviews.llvm.org/D45317
llvm-svn: 329791
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This is a follow up of rL327695 to instruction select more variants of VSELGT
and VSELGE, for which it is necessary to custom lower SELECT.
More work is required in this area, which will be addressed soon:
- more variants need to be regression tested, but this depends on the next point.
- first LowerConstantFP need to be adjusted for fp16 values.
Differential Revision: https://reviews.llvm.org/D45205
llvm-svn: 329788
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Merged 'tryMatchVectorRegister' (specific to Neon) and
'tryParseSVERegister' into a single 'tryParseVectorRegister' function, and
created a generic 'parseVectorKind()' function that returns the #Elements
and ElementWidth of a vector suffix. This reduces the duplication of
this functionality between two the vector implementations.
This is patch [1/6] in a series to add assembler/disassembler support for
SVE's contiguous ST1 (scalar+imm) instructions.
Reviewers: fhahn, rengolin, javed.absar, huntergr, SjoerdMeijer, t.p.northover, echristo, evandro
Reviewed By: fhahn
Subscribers: tschuett, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D45427
llvm-svn: 329782
|
| |
|
|
|
|
|
|
| |
512-bit masked intrinsic with unmasked intrinsic and a select.
The 128/256-bit versions were no longer used by clang. It uses the legacy SSE/AVX2 version and a select. The 512-bit was changed to the same for consistency.
llvm-svn: 329774
|
| |
|
|
|
|
|
|
|
|
| |
an explicit MOV8mr instruction.
Previously the code only knew how to handle setcc to a register.
This should fix a crash in the chromium build.
llvm-svn: 329771
|
| |
|
|
|
|
|
|
|
|
| |
With -fno-plt, for example, calls to printf when getting converted to puts
still use the PLT. This patch checks for the metadata "RtLibUseGOT" and
annotates the declaration with the right attributes.
Differential Revision: https://reviews.llvm.org/D45180
llvm-svn: 329768
|
| |
|
|
|
|
|
|
| |
With -fno-plt, global value references can use GOTPCREL and RIP must be used.
Differential Revision: https://reviews.llvm.org/D45460
llvm-svn: 329765
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Samuel Pitoiset
ds_read_b128 and ds_write_b128 have been recently enabled
under the amdgpu-ds128 option because the performance benefit
is unclear.
Though, using 128-bit loads/stores for the local address space
appears to introduce regressions in tessellation shaders. Not
sure what is broken, but as ds_read_b128/ds_write_b128 are not
enabled by default, just introduce a global option and enable
128-bit only if requested (until it's fixed/used correctly).
v2: - fix regressions in merge-stores.ll and multiple_tails.ll
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105464
llvm-svn: 329764
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
When inserting MOVs to avoid Falkor HWPF collisions, the non-base
register operand of load instructions (e.g. a register offset) was not
being considered live, so it could potentially have been used as a
scratch register, clobbering the actual offset value.
Reviewers: mcrosier
Subscribers: rengolin, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D45502
llvm-svn: 329761
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
edge values
This is based on an example that was recently posted on llvm-dev:
void *propagate_null(void* b, int* g) {
if (!b) {
return 0;
}
(*g)++;
return b;
}
https://godbolt.org/g/xYk3qG
The original code or constant propagation in other passes has obscured the fact
that the phi can be removed completely.
Differential Revision: https://reviews.llvm.org/D45448
llvm-svn: 329755
|
| |
|
|
|
|
|
|
|
| |
Summary:
The verification rules for the intrinsics for atomic memcpy, atomic memmove,
and atomic memset are basically code clones. This change merges their verification
rules into a single block to remove duplication.
llvm-svn: 329753
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Darwin dynamic linker can handle weak symbols in ConstDataSection.
ReadonReadOnlyWithRel symbols should be emitted in ConstDataSection
instead of normal DataSection.
rdar://problem/39298457
Reviewers: dexonsmith, kledzik
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D45472
llvm-svn: 329752
|