| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
| |
conditional branches for very large targets. That will be the next small
patch. Everything now should in principle work as good (functionality
wise) as without constant islands so we decided at Mips/Imagination to
make constant islands the default for Mips16 now so that it will get
excercised a lot and this port is still experimentatl though hopefully soon
we will change the status. Some more cleanup and code review is in order
but things are converging fast.
llvm-svn: 195902
|
| |
|
|
|
|
|
|
|
|
|
| |
make PIC calls a little more efficient:
1. Remove instructions setting up $gp if it is known that a function has been
called at least once.
2. Save the address of a called function in a register instead of loading
it from the GOT at every call site.
llvm-svn: 195892
|
| |
|
|
|
| |
NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195881
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SGPRs are spilled into VGPRs using the {READ,WRITE}LANE_B32 instructions.
v2:
- Fix encoding of Lane Mask
- Use correct register flags, so we don't overwrite the low dword
when restoring multi-dword registers.
v3:
- Register spilling seems to hang the GPU, so replace all shaders
that need spilling with a dummy shader.
v4:
- Fix *LANE definitions
- Change destination reg class for 32-bit SMRD instructions
v5:
- Remove small optimization that was crashing Serious Sam 3.
https://bugs.freedesktop.org/show_bug.cgi?id=68224
https://bugs.freedesktop.org/show_bug.cgi?id=71285
NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195880
|
| |
|
|
|
|
|
|
| |
Writing to the M0 register from an SMRD instruction hangs the GPU, so
we need to use the SGPR_32 register class, which does not include M0.
NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195879
|
| |
|
|
|
| |
NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195878
|
| |
|
|
|
|
| |
In particular, check the name of the symbol we are putting in the constant pool.
llvm-svn: 195865
|
| |
|
|
|
|
| |
of ACLE intrinsics.
llvm-svn: 195843
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is only used for asm printing.
On X86 we put basic block addresses on register before passing them to inline
asm, so the MO_MachineBasicBlock case was dead.
MO_ExternalSymbol was dead since any symbol being passed to inline asm
is represented as MO_GlobalAddress.
The MO_GlobalAddress and MO_Register cases were not tested.
llvm-svn: 195824
|
| |
|
|
| |
llvm-svn: 195803
|
| |
|
|
|
|
| |
instructions.
llvm-svn: 195788
|
| |
|
|
|
|
|
| |
The determination of when we are doing constant pools was being made too
early in the asm printer.
llvm-svn: 195781
|
| |
|
|
|
|
|
|
| |
- Fix bug in (vsext (vzext x)) -> (vsext x) in SIGN_EXTEND_IN_REG
lowering where we need to check whether x is a vector type (in-reg
type) of i8, i16 or i32; otherwise, that optimization is not valid.
llvm-svn: 195779
|
| |
|
|
| |
llvm-svn: 195759
|
| |
|
|
|
|
|
|
|
| |
We would wrongly transform the testcase into the equivalent of an AND with 1.
The problem was that, when testing whether the shifted-in bits of the right
shift were significant, we used the width of the final zero-extended result
rather than the width of the shifted value.
llvm-svn: 195731
|
| |
|
|
|
|
|
| |
and TRN.
Fix a bug when mixed use of vget_high_u8() and vuzp_u8().
llvm-svn: 195716
|
| |
|
|
| |
llvm-svn: 195713
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A Direct stack map location records the address of frame index. This
address is itself the value that the runtime requested. This differs
from IndirectMemRefOp locations, which refer to a stack locations from
which the requested values must be loaded. Direct locations can
directly communicate the address if an alloca, while IndirectMemRefOp
handle register spills.
For example:
entry:
%a = alloca i64...
llvm.experimental.stackmap(i32 <ID>, i32 <shadowBytes>, i64* %a)
Since both the alloca and stackmap intrinsic are in the entry block,
and the intrinsic takes the address of the alloca, the runtime can
assume that LLVM will not substitute alloca with any intervening
value. This must be verified by the runtime by checking that the stack
map's location is a Direct location type. The runtime can then
determine the alloca's relative location on the stack immediately after
compilation, or at any time thereafter. This differs from Register and
Indirect locations, because the runtime can only read the values in
those locations when execution reaches the instruction address of the
stack map.
llvm-svn: 195712
|
| |
|
|
| |
llvm-svn: 195697
|
| |
|
|
|
|
|
| |
I'm not sure how it was checking for the wrong values...
PR18023.
llvm-svn: 195670
|
| |
|
|
|
|
| |
Patch by Oliver Stannard.
llvm-svn: 195640
|
| |
|
|
| |
llvm-svn: 195636
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Moved the requirement for SelectionDAG::getConstant() to return legally
typed nodes slightly earlier. There were two optional DAGCombine passes
that were missed out and were required to produce type-legal DAGs.
Simplified a code-path in tryFoldToZero() to use SelectionDAG::getConstant().
This provides support for both promoted and expanded vector types whereas the
previous code only supported promoted vector types.
Fixes a "Type for zero vector elements is not legal" assertion detected by
an llvm-stress generated test.
Reviewers: resistor
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D2251
llvm-svn: 195635
|
| |
|
|
|
|
|
| |
A volatile load should block us from trying to coalesce stores.
PR18023
llvm-svn: 195599
|
| |
|
|
|
|
| |
sethi+or. This generates correct code for both sparc32 and sparc64.
llvm-svn: 195576
|
| |
|
|
|
|
| |
clobbered by calls but not used in the function itself.
llvm-svn: 195574
|
| |
|
|
| |
llvm-svn: 195573
|
| |
|
|
|
|
|
|
|
|
| |
to what is needed for constant islands. The prescan method for Mips16 constant
islands will eventually go away. It is only temporary and should be done
earlier when the instructions are first created or from the DAG. If we keep
it here we need to handle better the situation where constant islands
is called multiple times since don't want to prescan more than once.
llvm-svn: 195569
|
| |
|
|
| |
llvm-svn: 195566
|
| |
|
|
|
|
|
|
|
|
|
|
| |
I had to move some code and I moved a declaration forward past it's first use
in the function but by nutty coincidence there was another variable of the same
name and type and with completely unrelated function that was declared globally
in the class so no compilation error ensued.
It required some unusual conditions for it to even matter. Caused test
case casts.c in test-suite to fail during compilation with a duplicate
symbol error. I would have noticed it during final code review for this port.
llvm-svn: 195565
|
| |
|
|
|
|
|
|
|
|
| |
We are going to drop debug info without a version number or with a different
version number, to make sure we don't crash when we see bitcode files with
different debug info metadata format.
Make tests more robust by removing hard-coded metadata numbers in CHECK lines.
llvm-svn: 195535
|
| |
|
|
|
|
|
|
| |
We were ignoring the ordered/onordered bits and also the signed/unsigned
bits of condition codes when lowering the DAG to MachineInstrs.
NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195514
|
| |
|
|
|
|
|
|
| |
We are going to drop debug info without a version number or with a different
version number, to make sure we don't crash when we see bitcode files with
different debug info metadata format.
llvm-svn: 195504
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Utilizing the 8 and 16 bit comparison instructions, even when an input can
be folded into the comparison instruction itself, is typically not worth it.
There are too many partial register stalls as a result, leading to significant
slowdowns. By always performing comparisons on at least 32-bit
registers, performance of the calculation chain leading to the
comparison improves. Continue to use the smaller comparisons when
minimizing size, as that allows better folding of loads into the
comparison instructions.
rdar://15386341
llvm-svn: 195496
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Improvements over r195317:
- Set/restore EnableFastISel flag instead of just running FastISel within
SelectAllBasicBlocks; the flag is checked in various places, and
FastISel won't run properly if those places don't do the right thing.
- Test looks for normal ISel versus FastISel behavior, and not
something more subtle that doesn't work everywhere.
Based on work by Andrea Di Biagio.
llvm-svn: 195491
|
| |
|
|
|
|
| |
optimizes Constant values now.
llvm-svn: 195488
|
| |
|
|
|
|
|
| |
- When simplifying the mask generation for BLEND, check whether that mask is
also consumed by other non-BLEND insns. If true, skip that simplification.
llvm-svn: 195476
|
| |
|
|
|
|
|
|
|
|
|
|
| |
I've no idea why I decided to handle TMxx differently from all the other
high/low logic operations, but it was a stupid thing to do. The high
registers aren't available as separate 32-bit registers on z10,
so subreg_h32 can't be used on a GR64 there.
I've normally been testing with z196 and with -O3 and so hadn't noticed
this until now.
llvm-svn: 195473
|
| |
|
|
| |
llvm-svn: 195469
|
| |
|
|
|
|
| |
The callee will not pop the stack for us.
llvm-svn: 195467
|
| |
|
|
| |
llvm-svn: 195457
|
| |
|
|
|
|
| |
Patch by Oliver Stannard!
llvm-svn: 195448
|
| |
|
|
|
|
|
|
| |
from the appropriate integer vector type.
Fixes an instruction selection failure detected by llvm-stress.
llvm-svn: 195444
|
| |
|
|
| |
llvm-svn: 195439
|
| |
|
|
|
|
|
|
|
| |
and vector types.
e.g. "%tmp = load <2 x i64>* %ptr" can't be selected.
"%tmp = bitcast i64 %in to <2 x i32>" can't be selected.
llvm-svn: 195424
|
| |
|
|
|
|
| |
EXTRCT_SUBREG.
llvm-svn: 195408
|
| |
|
|
|
|
|
| |
They failed on bdver2 buildslave.
FIXME: FileCheck-ize them.
llvm-svn: 195407
|
| |
|
|
|
|
|
|
|
| |
The legalizer can now do this type of expansion for more
type combinations without loading and storing to and
from the stack.
NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195398
|
| |
|
|
|
|
|
|
|
|
| |
on certain architectures. While generating SHLD/SHRD instructions is acceptable when optimizing for size, optimizing for speed on these platforms should be implemented using alternative sequences of instructions composed of add, adc, shr, shl, or and lea which are directPath instructions. These alternative instructions not only have a lower latency but they also increase the decode bandwidth by allowing simultaneous decoding of a third directPath instruction.
AMD's processors family K7, K8, K10, K12, K15 and K16 are known to have SHLD/SHRD instructions with very poor latency. Optimization guides for these processors recommend using an alternative sequence of instructions. For these AMD's processors, I disabled folding (or (x << c) | (y >> (64 - c))) when we are not optimizing for size.
It might be beneficial to disable this folding for some of the Intel's processors. However, since I couldn't find specific recommendations regarding using SHLD/SHRD instructions on Intel's processors, I haven't disabled this peephole for Intel.
llvm-svn: 195383
|
| |
|
|
| |
llvm-svn: 195365
|