| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
intrinsic ones
Summary:
This change lowers the aarch64 integer vector min/max intrinsic nodes to
generic min/max nodes and replaces the intrinsic selection patterns with
the generic ones.
There should already be testing in place for this, so no further tests
were added.
Reviewers: jmolloy
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D12276
llvm-svn: 246030
|
| |
|
|
|
|
| |
Note that after this change branch probabilities are preserved now.
llvm-svn: 245998
|
| |
|
|
|
|
|
|
|
| |
This should be no functional change but for the record: For three cases
in X86FastISel this will change the order in which the FalseMBB and
TrueMBB of a conditional branch is addedd to the successor/predecessor
lists.
llvm-svn: 245997
|
| |
|
|
|
|
| |
Suggested by @sunfish as a follow-up to r245982.
llvm-svn: 245996
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This change makes the variable argument intrinsics, `llvm.va_start` and
`llvm.va_copy`, and the `va_arg` instruction behave as they do on Windows
inside a `CallingConv::X86_64_Win64` function. It's needed for a Clang patch
I have to add support for GCC's `__builtin_ms_va_list` constructs.
Reviewers: nadav, asl, eugenis
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D1622
llvm-svn: 245990
|
| |
|
|
|
|
| |
WebAssembly will either use globals or immediates, since it's a virtual ISA.
llvm-svn: 245989
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: Match spec format: https://github.com/WebAssembly/spec/blob/master/ml-proto/test/fac.wasm
Reviewers: sunfish
Subscribers: llvm-commits, jfb
Differential Revision: http://reviews.llvm.org/D12307
llvm-svn: 245986
|
| |
|
|
|
|
| |
Do the same for .weak (not implemented for now, but may as well to it). Update comment string to two semicolons.
llvm-svn: 245982
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a follow-on from the discussion in http://reviews.llvm.org/D12154.
This change allows memset/memcpy to use SSE or AVX memory accesses for any chip that has
generally fast unaligned memory ops.
A motivating use case for this change is a clang invocation that doesn't explicitly set
the CPU, but does target a feature that we know only exists on a CPU that supports fast
unaligned memops. For example:
$ clang -O1 foo.c -mavx
This resolves a difference in lowering noted in PR24449:
https://llvm.org/bugs/show_bug.cgi?id=24449
Before this patch, we used different store types depending on whether the example can be
lowered as a memset or not.
Differential Revision: http://reviews.llvm.org/D12288
llvm-svn: 245950
|
| |
|
|
|
|
|
| |
As of r245924, _ftol2 is no longer used for fptoui on MS platforms.
Remove the dead code associated with it.
llvm-svn: 245925
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes two issues in x86 fptoui lowering.
1) Makes conversions from f80 go through the right path on AVX-512.
2) Implements an inline sequence for fptoui i64 instead of a library
call. This improves performance by 6X on SSE3+ and 3X otherwise.
Incidentally, it also removes the use of ftol2 for fptoui, which was
wrong to begin with, as ftol2 converts to a signed i64, producing
wrong results for values >= 2^63.
Patch by: mitch.l.bodart@intel.com
Differential Revision: http://reviews.llvm.org/D11316
llvm-svn: 245924
|
| |
|
|
| |
llvm-svn: 245921
|
| |
|
|
|
|
|
|
| |
This reverts commit 433bfd94e4b7e3cc3f8b08f8513ce47817941b0c.
Broke some bot, have to see why it passed locally.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 245917
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We removed access to the DataLayout on the TargetMachine and
deprecated the C API function LLVMGetTargetMachineData() in r243114.
However the way I tried to be backward compatible was broken: I
changed the wrapper of the TargetMachine to be a structure that
includes the DataLayout as well. However the TargetMachine is also
wrapped by the ExecutionEngine, in the more classic way. A client
using the TargetMachine wrapped by the ExecutionEngine and trying
to get the DataLayout would break.
It seems tricky to solve the problem completely in the C API
implementation. This patch tries to address this backward
compatibility in a more lighter way in the C++ API. The C API is
restored in its original state and the removed C++ API is
reintroduced, but privately. The C API is friended to the
TargetMachine and should be the only consumer for this API.
Reviewers: ributzka
Differential Revision: http://reviews.llvm.org/D12263
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 245916
|
| |
|
|
|
|
|
|
| |
We might end up with a trivial copy as the addend, and if so, we should ignore
the corresponding FMA instruction. The trivial copy can be coalesced away later,
so there's nothing to do here. We should not, however, assert. Fixes PR24544.
llvm-svn: 245907
|
| |
|
|
| |
llvm-svn: 245895
|
| |
|
|
| |
llvm-svn: 245893
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Summary: Support function calls.
Reviewers: sunfish, sunfishcode
Subscribers: sunfishcode, jfb, llvm-commits
Differential revision: http://reviews.llvm.org/D12219
llvm-svn: 245887
|
| |
|
|
|
|
|
|
|
|
| |
Summary: I forgot to squash git commits before doing an svn dcommit of D12219. Reverting, and re-submitting.
Subscribers: jfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D12298
llvm-svn: 245886
|
| |
|
|
| |
llvm-svn: 245883
|
| |
|
|
| |
llvm-svn: 245882
|
| |
|
|
| |
llvm-svn: 245875
|
| |
|
|
| |
llvm-svn: 245872
|
| |
|
|
|
|
|
|
|
|
| |
This patch fixes PR24546, which demonstrates a segfault during the VSX
swap removal pass. The problem is that debug value instructions were
not excluded from the list of instructions to be analyzed for webs of
related computation. I've added the test case from the PR as a crash
test in test/CodeGen/PowerPC.
llvm-svn: 245862
|
| |
|
|
| |
llvm-svn: 245860
|
| |
|
|
| |
llvm-svn: 245859
|
| |
|
|
| |
llvm-svn: 245853
|
| |
|
|
| |
llvm-svn: 245852
|
| |
|
|
| |
llvm-svn: 245851
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D12151
llvm-svn: 245835
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D12232
llvm-svn: 245830
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D12230
llvm-svn: 245829
|
| |
|
|
| |
llvm-svn: 245825
|
| |
|
|
|
|
|
| |
Reported by coverity.
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 245800
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
__shared__ variable may now emit undef value as initializer, do not
throw error on that.
Test Plan: test/CodeGen/NVPTX/global-addrspace.ll
Patch by Xuetian Weng
Reviewers: jholewinski, tra, jingyue
Subscribers: llvm-commits, jholewinski
Differential Revision: http://reviews.llvm.org/D12242
llvm-svn: 245785
|
| |
|
|
|
|
|
|
| |
Although the basic s_load_* instructions happen to use the same
opcode, some of the special case SMRD instructions have
different opcodes.
llvm-svn: 245775
|
| |
|
|
| |
llvm-svn: 245774
|
| |
|
|
| |
llvm-svn: 245772
|
| |
|
|
| |
llvm-svn: 245769
|
| |
|
|
| |
llvm-svn: 245768
|
| |
|
|
|
|
| |
There are still a couple of CI patterns left in SIInstructions.
llvm-svn: 245767
|
| |
|
|
|
|
|
|
| |
The main change is inverting the condition for the
operand class classes so that VT.Size == 16 uses VGPR_32
instead of 64.
llvm-svn: 245764
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can wait on either VM, EXP or LGKM.
The waits are independent.
Without this patch, a wait inserted because of one of them
would also wait for all the previous others.
This patch makes s_wait only wait for the ones we need for the next
instruction.
Here's an example of subtle perf reduction this patch solves:
This is without the patch:
buffer_load_format_xyzw v[8:11], v0, s[44:47], 0 idxen
buffer_load_format_xyzw v[12:15], v0, s[48:51], 0 idxen
s_load_dwordx4 s[44:47], s[8:9], 0xc
s_waitcnt lgkmcnt(0)
buffer_load_format_xyzw v[16:19], v0, s[52:55], 0 idxen
s_load_dwordx4 s[48:51], s[8:9], 0x10
s_waitcnt vmcnt(1)
buffer_load_format_xyzw v[20:23], v0, s[44:47], 0 idxen
The s_waitcnt vmcnt(1) is useless.
The reason it is added is because the last
buffer_load_format_xyzw needs s[44:47], which was issued
by the first s_load_dwordx4. It waits for all VM
before that call to have finished.
Internally after every instruction, 3 counters (for VM, EXP and LGTM)
are updated after every instruction. For example buffer_load_format_xyzw
will
increase the VM counter, and s_load_dwordx4 the LGKM one.
Without the patch, for every defined register,
the current 3 counters are stored, and are used to know
how long to wait when an instruction needs the register.
Because of that, the s[44:47] counter includes that to use the register
you need to wait for the previous buffer_load_format_xyzw.
Instead this patch stores only the counters that matter for the
register,
and puts zero for the other ones, since we don't need any wait for them.
Patch by: Axel Davy
Differential Revision: http://reviews.llvm.org/D11883
llvm-svn: 245755
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D12040
llvm-svn: 245744
|
| |
|
|
|
|
|
|
|
|
| |
When PPCVSXFMAMutate would look at the input addend register, it would get its
input value number. This would fail, however, if the register was undef,
causing a segfault. Don't segfault (just skip such FMA instructions).
Fixes the test case from PR24542 (although that may have been over-reduced).
llvm-svn: 245741
|
| |
|
|
| |
llvm-svn: 245735
|
| |
|
|
|
|
|
| |
See discussion in D12154 ( http://reviews.llvm.org/D12154 ), AMD Software
Optimization Guides for 10H/12H/15H/16H, and Agner Fog's experimental data.
llvm-svn: 245733
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a 'no functional change intended' patch. It removes one FIXME, but adds several more.
Motivation: the FeatureFastUAMem attribute may be too general. It is used to determine if any
sized misaligned memory access under 32-bytes is 'fast'. From the added FIXME comments, however,
you can see that we're not consistent about this. Changing the name of the attribute makes it
clearer to see the logic holes.
Changing this to a 'slow' attribute also means we don't have to add an explicit 'fast' attribute
to new chips; fast unaligned accesses have been standard for several generations of CPUs now.
Differential Revision: http://reviews.llvm.org/D12154
llvm-svn: 245729
|
| |
|
|
| |
llvm-svn: 245715
|
| |
|
|
| |
llvm-svn: 245705
|