| Commit message (Collapse) | Author | Age | Files | Lines | 
| | 
| 
| 
| 
| 
| 
|  | 
See discussion in D12154 ( http://reviews.llvm.org/D12154 ), AMD Software
Optimization Guides for 10H/12H/15H/16H, and Agner Fog's experimental data.
llvm-svn: 245733
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This is a 'no functional change intended' patch. It removes one FIXME, but adds several more.
Motivation: the FeatureFastUAMem attribute may be too general. It is used to determine if any
sized misaligned memory access under 32-bytes is 'fast'. From the added FIXME comments, however,
you can see that we're not consistent about this. Changing the name of the attribute makes it
clearer to see the logic holes.
Changing this to a 'slow' attribute also means we don't have to add an explicit 'fast' attribute
to new chips; fast unaligned accesses have been standard for several generations of CPUs now.
Differential Revision: http://reviews.llvm.org/D12154
llvm-svn: 245729
 | 
| | 
| 
| 
|  | 
llvm-svn: 245715
 | 
| | 
| 
| 
|  | 
llvm-svn: 245705
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Note: I do not implement a base pointer, so it's still impossible to
have dynamic realignment AND dynamic alloca in the same function.
This also moves the code for determining the frame index reference
into getFrameIndexReference, where it belongs, instead of inline in
eliminateFrameIndex.
[Begin long-winded screed]
Now, stack realignment for Sparc is actually a silly thing to support,
because the Sparc ABI has no need for it -- unlike the situation on
x86, the stack is ALWAYS aligned to the required alignment for the CPU
instructions: 8 bytes on sparcv8, and 16 bytes on sparcv9.
However, LLVM unfortunately implements user-specified overalignment
using stack realignment support, so for now, I'm going to go along
with that tradition. GCC instead treats objects which have alignment
specification greater than the maximum CPU-required alignment for the
target as a separate block of stack memory, with their own virtual
base pointer (which gets aligned). Doing it that way avoids needing to
implement per-target support for stack realignment, except for the
targets which *actually* have an ABI-specified stack alignment which
is too small for the CPU's requirements.
Further unfortunately in LLVM, the default canRealignStack for all
targets effectively returns true, despite that implementing that is
something a target needs to do specifically. So, the previous behavior
on Sparc was to silently ignore the user's specified stack
alignment. Ugh.
Yet MORE unfortunate, if a target actually does return false from
canRealignStack, that also causes the user-specified alignment to be
*silently ignored*, rather than emitting an error.
(I started looking into fixing that last, but it broke a bunch of
tests, because LLVM actually *depends* on having it silently ignored:
some architectures (e.g. non-linux i386) have smaller stack alignment
than spilled-register alignment. But, the fact that a register needs
spilling is not known until within the register allocator. And by that
point, the decision to not reserve the frame pointer has been frozen
in place. And without a frame pointer, stack realignment is not
possible. So, canRealignStack() returns false, and
needsStackRealignment() then returns false, assuming everyone can just
go on their merry way assuming the alignment requirements were
probably just suggestions after-all. Sigh...)
Differential Revision: http://reviews.llvm.org/D12208
llvm-svn: 245668
 | 
| | 
| 
| 
|  | 
llvm-svn: 245661
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
When producing conditional compare sequences for or operations we need
to negate the operands and the finally tested flags. The thing is if we negate
the finally tested flags this equals a logical negation of all previously
emitted expressions. There was a case missing where we have to order OR
expressions so they get emitted first.
This fixes http://llvm.org/PR24459
llvm-svn: 245641
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
Create CMP;CCMP sequences from and/or trees does not gain us anything if
the and/or tree is materialized to a GP register anyway. While most of
the code already checked for hasOneUse() there was one important case
missing.
llvm-svn: 245640
 | 
| | 
| 
| 
|  | 
llvm-svn: 245636
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Fixes PR23464: one way to use the broadcast intrinsics is:
  _mm256_broadcastw_epi16(_mm_cvtsi32_si128(*(int*)src));
We don't currently fold this, but now that we use native IR for
the intrinsics (r245605), we can look through one bitcast to find
the broadcast scalar.
Differential Revision: http://reviews.llvm.org/D10557
llvm-svn: 245613
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
Add an LSR test that exercises isTruncateFree. Without this change, LSR creates
another indvar representing the truncated value.
Reviewers: jholewinski, eliben
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D12058
llvm-svn: 245611
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Since r245605, the clang headers don't use these anymore.
r245165 updated some of the tests already; update the others, add
an autoupgrade, remove the intrinsics, and cleanup the definitions.
Differential Revision: http://reviews.llvm.org/D10555
llvm-svn: 245606
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
It won't go well. We've already marked 64-bit SETCCs as non-Custom, but it's just possible that a SETCC has a legal result type but an illegal operand type. If this happens, bail out before we create unselectable nodes.
Fixes PR24292. I tried to create a testcase but in 99% of cases we can't trigger this - not surprising that this bug has been latent since 2009.
llvm-svn: 245577
 | 
| | 
| 
| 
| 
| 
|  | 
Differential Revision: http://reviews.llvm.org/D12194
llvm-svn: 245575
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
FBLD and FBSTP should receive TBYTE because it is defined as
FBLD m80
FBSTP m80
Differential Revision: http://reviews.llvm.org/D11748
llvm-svn: 245553
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
COMISD should receive QWORD because it is defined as
 (V)COMISD xmm1, xmm2/m64
COMISS should receive DWORD because it is defined as
 (V)COMISS xmm1, xmm2/m32
Differential Revision: http://reviews.llvm.org/D11712
llvm-svn: 245551
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
We didn't check for the necessary preconditions before folding a
mask/shift into a single mask.
This fixes PR24516.
llvm-svn: 245544
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
XVCMPEQDP is used for VSX v2f64 equality comparisons, but the value type needs
to be v2i64 (as that's the corresponding SETCC type).
Fixes PR24225.
llvm-svn: 245535
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
This DAGCombine was creating custom SDAG nodes with an illegal ppc_fp128
operand type because it was triggering on f64/f32 int2fp(fp2int(ppc_fp128 x)),
but shouldn't (it should only apply to f32/f64 types). The result was a crash.
llvm-svn: 245530
 | 
| | 
| 
| 
|  | 
llvm-svn: 245506
 | 
| | 
| 
| 
| 
| 
|  | 
maximums
llvm-svn: 245504
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
We are already falling back to SelectionDAG when encountering an shift with UB.
This adds the same checks for shifts with UB that get folded into arithmetic or
logical operations.
This fixes rdar://problem/22345295.
llvm-svn: 245499
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
We don't do a great job with >= 0 comparisons against zero when the
result is used as an i8.
Given something like:
  void f(long long LL, bool *B) {
    *B = LL >= 0;
  }
We used to generate:
  shrq    $63, %rdi
  xorb    $1, %dil
  movb    %dil, (%rsi)
Now we generate:
  testq   %rdi, %rdi
  setns   (%rsi)
Differential Revision: http://reviews.llvm.org/D12136
llvm-svn: 245498
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
Previously WebAssembly's datalayout string had -v128:8:128. This had been an
attempt to declare a certain level of support for unaligned SIMD accesses.
However, clang makes its own determinations for SIMD alignment that are
independent of the datalayout string, so this wasn't actually meaningful.
llvm-svn: 245494
 | 
| | 
| 
| 
|  | 
llvm-svn: 245485
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
This revision has introduced an issue that only affects bootstrapped compiler
when it is printing the ASM. I am working on resolving the issue, but in the
meantime, I'm disabling the legalization of scalar_to_vector operation for v2i64
and the associated testing until I can get this fixed.
llvm-svn: 245481
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Reintroduce r245442. Remove an overly conservative assertion introduced
in r245442. We could replace the assertion to use `shareSameRegisterFile`
instead, but in that point in `insertPHI` we already lost the original
Def subreg to check against. So drop the assertion completely.
Original commit message:
- Teaches the ValueTracker in the PeepholeOptimizer to look through PHI
instructions.
- Add findNextSourceAndRewritePHI method to lookup into multiple sources
returnted by the ValueTracker and rewrite PHIs with new sources.
With these changes we can find more register sources and rewrite more
copies to allow coaslescing of bitcast instructions. Hence, we eliminate
unnecessary VR64 <-> GR64 copies in x86, but it could be extended to
other archs by marking "isBitcast" on target specific instructions. The
x86 example follows:
A:
  psllq %mm1, %mm0
  movd  %mm0, %r9
  jmp C
B:
  por %mm1, %mm0
  movd  %mm0, %r9
  jmp C
C:
  movd  %r9, %mm0
  pshufw  $238, %mm0, %mm0
Becomes:
A:
  psllq %mm1, %mm0
  jmp C
B:
  por %mm1, %mm0
  jmp C
C:
  pshufw  $238, %mm0, %mm0
Differential Revision: http://reviews.llvm.org/D11197
rdar://problem/20404526
llvm-svn: 245479
 | 
| | 
| 
| 
|  | 
llvm-svn: 245475
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
Since r244955, we try to use the short-form ErrorInfo when both
tries failed, and the long-form match failed on a suffix operand.
However, this means we sometimes mix ErrorInfo and MatchResult
(one manifestation of this being PR24498). Instead, restore both.
llvm-svn: 245469
 | 
| | 
| 
| 
| 
| 
| 
|  | 
This reverts commit r245443, as it broke AArch64 test-suite tramp3d
with an assert "Reg && "Null register has no regunits".
llvm-svn: 245455
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This patch updates the X86 lowering so that the Exception Pointer and Selector
are 64-bit wide only if Subtarget.isTarget64BitLP64.
Patch by João Porto
Reviewers: dschuff, rnk
Differential Revision: http://reviews.llvm.org/D12111
llvm-svn: 245454
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
x32 has 32-bit pointers; x86-64 can't jmp %r32. This patch addresses this issue by explicitly zero-extending brind's target to 64-bits.
Author: jpp
Reviewers: jfb, dschuff, pavel.v.chupin
Subscribers: llvm-commits
Differential revision: http://reviews.llvm.org/D12112
llvm-svn: 245452
 | 
| | 
| 
| 
|  | 
llvm-svn: 245450
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
sources"
Revert r245442 while investigating a fix. An assertion hit in
http://lab.llvm.org:8080/green/job/clang-stage1-configure-RA_build/11380
llvm-svn: 245446
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
eliminate the trunc.
Differential Revision: http://reviews.llvm.org/D10442
llvm-svn: 245444
 | 
| | 
| 
| 
|  | 
llvm-svn: 245443
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Reapply r243486.
- Teaches the ValueTracker in the PeepholeOptimizer to look through PHI
instructions.
- Add findNextSourceAndRewritePHI method to lookup into multiple sources
returnted by the ValueTracker and rewrite PHIs with new sources.
With these changes we can find more register sources and rewrite more
copies to allow coaslescing of bitcast instructions. Hence, we eliminate
unnecessary VR64 <-> GR64 copies in x86, but it could be extended to
other archs by marking "isBitcast" on target specific instructions. The
x86 example follows:
A:
  psllq %mm1, %mm0
  movd  %mm0, %r9
  jmp C
B:
  por %mm1, %mm0
  movd  %mm0, %r9
  jmp C
C:
  movd  %r9, %mm0
  pshufw  $238, %mm0, %mm0
Becomes:
A:
  psllq %mm1, %mm0
  jmp C
B:
  por %mm1, %mm0
  jmp C
C:
  pshufw  $238, %mm0, %mm0
Differential Revision: http://reviews.llvm.org/D11197
rdar://problem/20404526
llvm-svn: 245442
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
The mid-end was generating vector smin/smax/umin/umax nodes, but
we were using vbsl to generatate the code. This adds the vmin/vmax
patterns and a test to check that we are now generating vmin/vmax
instructions.
Reviewers: rengolin, jmolloy
Subscribers: aemerson, rengolin, llvm-commits
Differential Revision: http://reviews.llvm.org/D12105
llvm-svn: 245439
 | 
| | 
| 
| 
|  | 
llvm-svn: 245437
 | 
| | 
| 
| 
| 
| 
| 
|  | 
This reverts commit 245169 which miscompiles MultiSource/Applications/siod
from LNT.
llvm-svn: 245432
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
optimizing for minsize
There are some cases where the mul sequence is smaller, but for the most part,
using a div is preferable. This does not apply to vectors, since x86 doesn't
have vector idiv, and a vector mul/shifts sequence ought to be smaller than a
scalarized division.
Differential Revision: http://reviews.llvm.org/D12082
llvm-svn: 245431
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This removes the isPow2SDivCheap() query, as it is not currently used in
any meaningful way. isIntDivCheap() no longer relies on a state variable
(as all in-tree target set it to false), but the interface allows querying
based on the type optimization level.
NFC.
Differential Revision: http://reviews.llvm.org/D12082
llvm-svn: 245430
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
This commit adds support for bit mask target flag serialization to the MIR
printer and the MIR parser. It also adds support for the machine operand's
target flag serialization to the AArch64 target.
Reviewers: Duncan P. N. Exon Smith
llvm-svn: 245383
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
This consolidates use of isUnalignedMem32Slow() in one place.
There is a slight change in logic although I'm not sure that it would ever
come up in the real world: we were assuming that an alignment of the type 
size is always fast; now, we actually check the data layout to confirm that.
llvm-svn: 245382
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
To properly handle this, define the *a instructions as separate
instruction classes by refactoring the LoadA and StoreA multiclasses.
Move the instruction tests into the sparcv9 file to test the difference.
llvm-svn: 245360
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
State numbers are calculated by performing a walk from the innermost
funclet to the outermost funclet.   Rudimentary support for the new EH
constructs has been added to the assembly printer, just enough to test
the new machinery.
Differential Revision: http://reviews.llvm.org/D12098
llvm-svn: 245331
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This method checks whether a physical regiser or any of its aliases are
used in the function.
Using this function in SIRegisterInfo::findUnusedReg() should also fix
this reported failure:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150803/292143.html
http://reviews.llvm.org/rL242173#inline-533
The report doesn't come with a testcase and I don't know enough about
AMDGPU to create one myself.
llvm-svn: 245329
 | 
| | 
| 
| 
| 
| 
| 
|  | 
These were missed when other uses were switched over:
http://llvm.org/viewvc/llvm-project?view=revision&revision=243994
llvm-svn: 245311
 | 
| | 
| 
| 
|  | 
llvm-svn: 245307
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary: This is the correct way to handle JAL instructions when PIC is enabled.
Patch by Toma Tabacu
Reviewers: seanbruno, tomatabacu
Subscribers: brooks, seanbruno, emaste, llvm-commits
Differential Revision: http://reviews.llvm.org/D6231
llvm-svn: 245305
 |