| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
immediate operands.
This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.
This recommits r300932 and r300930, which was causing dag-combine to
loop forever. The problem was that optimizeLogicalImm was returning
true even when there was no change to the immediate node (which happened
when the immediate was all zeros or ones), which caused dag-combine to
push and pop the same node to the work list over and over again without
making any progress.
This commit fixes the bug by returning false early in optimizeLogicalImm
if the immediate is all zeros or ones. Also, it changes the code to
compare the immediate with 0 or Mask rather than calling
countPopulation.
rdar://problem/18231627
Differential Revision: https://reviews.llvm.org/D5591
llvm-svn: 301019
|
|
|
|
|
|
|
|
|
| |
It seems that r300930 was creating an infinite loop in dag-combine when
compling the following file:
MultiSource/Benchmarks/MiBench/consumer-typeset/z21.c
llvm-svn: 300940
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
immediate operands.
This commit adds an AArch64 dag-combine that optimizes code generation
for logical instructions taking immediate operands. The optimization
uses demanded bits to change a logical instruction's immediate operand
so that the immediate can be folded into the immediate field of the
instruction.
This recommits r300913, which broke bots because I didn't fix a call to
ShrinkDemandedConstant in SIISelLowering.cpp after changing the APIs of
TargetLoweringOpt and TargetLowering.
rdar://problem/18231627
Differential Revision: https://reviews.llvm.org/D5591
llvm-svn: 300930
|
|
|
|
|
|
|
| |
Promote them to i32 vectors to avoid unpacking and re-packing
the vectors.
llvm-svn: 300754
|
|
|
|
|
|
|
| |
Insert a VReg_1 virtual register so the i1 workaround pass
can handle it.
llvm-svn: 300113
|
|
|
|
|
|
| |
Prepare for handling non-entry functions.
llvm-svn: 299999
|
|
|
|
|
|
|
| |
Split into smaller functions and prepare for handling
non-entry functions.
llvm-svn: 299998
|
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D31786
llvm-svn: 299727
|
|
|
|
|
|
|
| |
FCOPYSIGN is lowered to bit operations which don't clear the high
bits.
llvm-svn: 299708
|
|
|
|
|
|
|
| |
This does not do what it is attempting to use it for
and requires working around in LowerFormalArguments.
llvm-svn: 299667
|
|
|
|
|
|
|
|
|
|
| |
If a workgroup size is known to be not greater than wavefront size
the s_barrier instruction is not needed since all threads are guarantied
to come to the same point at the same time.
Differential Revision: https://reviews.llvm.org/D31731
llvm-svn: 299659
|
|
|
|
| |
llvm-svn: 299444
|
|
|
|
| |
llvm-svn: 299391
|
|
|
|
| |
llvm-svn: 299372
|
|
|
|
|
|
|
|
|
|
| |
Add a new node to act as a fancy bitcast from f16 operations to
i32 that implicitly zero the high 16-bits of the result.
Alternatively could try making v2f16 legal and canonicalizing
on build_vectors.
llvm-svn: 299246
|
|
|
|
|
|
| |
Add scope, order, isVolatile
llvm-svn: 299122
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As we introduced target triple environment amdgiz and amdgizcl, the address
space values are no longer enums. We have to decide the value by target triple.
The basic idea is to use struct AMDGPUAS to represent address space values.
For address space values which are not depend on target triple, use static
const members, so that they don't occupy extra memory space and is equivalent
to a compile time constant.
Since the struct is lightweight and cheap, it can be created on the fly at
the point of usage. Or it can be added as member to a pass and created at
the beginning of the run* function.
Differential Revision: https://reviews.llvm.org/D31284
llvm-svn: 298846
|
|
|
|
| |
llvm-svn: 298730
|
|
|
|
|
|
|
|
| |
This is used for a specific type of return to a shader part's
epilog code. Rename to try avoiding confusion from a true
call's return.
llvm-svn: 298452
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: arsenm
Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, dstuttard, tpr
Differential Revision: https://reviews.llvm.org/D31157
llvm-svn: 298396
|
|
|
|
|
|
| |
Fold these to undef during lowering so users get eliminated.
llvm-svn: 298387
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Move backend internal intrinsics along with the rest of the
normal intrinsics, and use the Intrinsic::getDeclaration
API instead of manually constructing the type list.
It's surprising this was working before. fdiv.fast had
the wrong number of parameters. The control flow intrinsic
declaration attributes were not being applied, and
their types were inconsistent. The actual IR use types
did not match the declaration, and were closer to the
types used for the patterns. The brcond lowering
was changing the types, so introduce new nodes for those.
llvm-svn: 298119
|
|
|
|
| |
llvm-svn: 297913
|
|
|
|
| |
llvm-svn: 297662
|
|
|
|
| |
llvm-svn: 297658
|
|
|
|
| |
llvm-svn: 297557
|
|
|
|
| |
llvm-svn: 296401
|
|
|
|
| |
llvm-svn: 296396
|
|
|
|
|
|
|
| |
Add packed types as legal so they may be used with inlineasm.
Keep all operations expanded for now.
llvm-svn: 296379
|
|
|
|
| |
llvm-svn: 295908
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D30232
llvm-svn: 295904
|
|
|
|
| |
llvm-svn: 295899
|
|
|
|
|
|
| |
Fixes not adjusting using new intrinsics with chains.
llvm-svn: 295878
|
|
|
|
|
|
| |
This reverts commit r295867.
llvm-svn: 295871
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D30232
llvm-svn: 295867
|
|
|
|
|
|
| |
Convert llvm.SI.packf16 test uses
llvm-svn: 295797
|
|
|
|
|
|
|
|
|
|
|
| |
Change implementation to use max instead of add.
min/max/med3 do not flush denormals regardless of the mode,
so it is OK to use it whether or not they are enabled.
Also allow using clamp with f16, and use knowledge
of dx10_clamp.
llvm-svn: 295788
|
|
|
|
| |
llvm-svn: 295783
|
|
|
|
| |
llvm-svn: 295754
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before frame offsets are calculated, try to eliminate the
frame indexes used by SGPR spills. Then we can delete them
after.
I think for now we can be sure that no other instruction
will be re-using the same frame indexes. It should be easy
to notice if this assumption ever breaks since everything
asserts if it tries to use a dead frame index later.
The unused emergency stack slot seems to still be left behind,
so an additional 4 bytes is still wasted.
llvm-svn: 295753
|
|
|
|
| |
llvm-svn: 295554
|
|
|
|
| |
llvm-svn: 295489
|
|
|
|
| |
llvm-svn: 295358
|
|
|
|
| |
llvm-svn: 295270
|
|
|
|
|
|
| |
Update test uses with expansion in terms of new intrinsics.
llvm-svn: 295269
|
|
|
|
| |
llvm-svn: 295244
|
|
|
|
| |
llvm-svn: 294694
|
|
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D26010
llvm-svn: 294692
|
|
|
|
|
|
|
|
|
|
| |
I think this is safe as long as no inputs are known to ever
be nans.
Also add an intrinsic for fmed3 to be able to handle all safe
math cases.
llvm-svn: 293598
|
|
|
|
| |
llvm-svn: 293514
|