| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Additional filtering of undesired shifts for targets that do not support them efficiently.
Related with D69116 and D69120
Applies the TLI.getShiftAmountThreshold hook to prevent undesired generation of shifts for the following IR code:
```
define i16 @testShiftBits(i16 %a) {
entry:
%and = and i16 %a, -64
%cmp = icmp eq i16 %and, 64
%conv = zext i1 %cmp to i16
ret i16 %conv
}
define i16 @testShiftBits_11(i16 %a) {
entry:
%cmp = icmp ugt i16 %a, 63
%conv = zext i1 %cmp to i16
ret i16 %conv
}
define i16 @testShiftBits_12(i16 %a) {
entry:
%cmp = icmp ult i16 %a, 64
%conv = zext i1 %cmp to i16
ret i16 %conv
}
```
The attached diff file shows the piece code in TargetLowering that is responsible for the generation of shifts in relation to the IR above.
Before applying this patch, shifts will be generated to replace non-legal icmp immediates. However, shifts may be undesired if they are even more expensive for the target.
For all my previous patches in this series (cited above) I added test cases for the MSP430 target. However, in this case, the target is not suitable for showing improvements related with this patch, because the MSP430 does not implement "isLegalICmpImmediate". The default implementation returns always true, therefore the patched code in TargetLowering is never reached for that target. Targets implementing both "isLegalICmpImmediate" and "getShiftAmountThreshold" will benefit from this.
The differential effect of this patch can only be shown for the MSP430 by temporarily implementing "isLegalICmpImmediate" to return false for large immediates. This is simulated with the implementation of a command line flag that was incorporated in D69975
This patch belongs to a initiative to "relax" the generation of shifts by LLVM for targets requiring it
Reviewers: spatel, lebedev.ri, asl
Reviewed By: spatel
Subscribers: lenary, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69326
|
| |
|
| |
|
| |
|
|
|
|
|
| |
These checks fall out naturally from the current implementation without
needing to be explicitly considered anymore.
|
|
|
|
|
|
| |
macros
Patch based on Sourabh Singh's D69839 patch.
|
|
|
|
|
|
|
| |
This was arbitrarily appearing in only the last section emitted - which
made tests more sensitive than they needed to be (removing the last
section - like the macinfo section change that's coming after this)
would, surprisingly, move the blank line to the previous section.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The macinfo support was broken for LTO situations, by terminating
macinfo lists only once - multiple macinfo contributions were correctly
labeled, but they all continued/flowed into later contributions until
only one terminator appeared at the end of the section.
Correctly terminate each contribution & fix the parsing to handle this
situation too. The parsing fix is also necessary for dumping linked
binaries - the previous code would stop at the end of the first
contribution - missing all later contributions in a linked binary.
It'd be nice to improve the dumping to print the offsets of each
contribution so it'd be easier to know which CU AT_macro_info refers to
which macinfo contribution.
|
|
|
|
|
|
|
|
|
|
| |
We had some code for this for 32-bit ARM, but this doesn't really need
to be in target-specific code; generalize it.
(I think this started showing up recently because we added an
optimization that converts pow to powi.)
Differential Revision: https://reviews.llvm.org/D69013
|
|
|
|
|
|
|
|
|
|
| |
Refactor usage of isCopyInstrImpl, isCopyInstr and isAddImmediate methods
to return optional machine operand pair of destination and source
registers.
Patch by Nikola Prica
Differential Revision: https://reviews.llvm.org/D69622
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This triggered asserts in the Chromium build, see https://crbug.com/1022729 for
details and reproducer.
> Without this change, when a nested tag type of any kind (enum, class,
> struct, union) is used as a variable type, it is emitted without
> emitting the parent type. In CodeView, parent types point to their inner
> types, and inner types do not point back to their parents. We already
> walk over all of the parent scopes to build the fully qualified name.
> This change simply requests their type indices as we go along to enusre
> they are all emitted.
>
> Fixes PR43905
>
> Reviewers: akhuang, amccarth
>
> Differential Revision: https://reviews.llvm.org/D69924
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The greedy register allocator occasionally decides to insert a large number of
unnecessary copies, see below for an example. The -consider-local-interval-cost
option (which X86 already enables by default) fixes this. We enable this option
for AArch64 only after receiving feedback that this change is not beneficial for
PowerPC.
We evaluated the impact of this change on compile time, code size and
performance benchmarks.
This option has a small impact on compile time, measured on CTMark. A 0.1%
geomean regression on -O1 and -O2, and 0.2% geomean for -O3, with at most 0.5%
on individual benchmarks.
The effect on both code size and performance on AArch64 for the LLVM test suite
is nil on the geomean with individual outliers (ignoring short exec_times)
between:
best worst
size..text -3.3% +0.0%
exec_time -5.8% +2.3%
On SPEC CPU® 2017 (compiled for AArch64) there is a minor reduction (-0.2% at
most) in code size on some benchmarks, with a tiny movement (-0.01%) on the
geomean. Neither intrate nor fprate show any change in performance.
This patch makes the following changes.
- For the AArch64 target, enableAdvancedRASplitCost() now returns true.
- Ensures that -consider-local-interval-cost=false can disable the new
behaviour if necessary.
This matrix multiply example:
$ cat test.c
long A[8][8];
long B[8][8];
long C[8][8];
void run_test() {
for (int k = 0; k < 8; k++) {
for (int i = 0; i < 8; i++) {
for (int j = 0; j < 8; j++) {
C[i][j] += A[i][k] * B[k][j];
}
}
}
}
results in the following generated code on AArch64:
$ clang --target=aarch64-arm-none-eabi -O3 -S test.c -o -
[...]
// %for.cond1.preheader
// =>This Inner Loop Header: Depth=1
add x14, x11, x9
str q0, [sp, #16] // 16-byte Folded Spill
ldr q0, [x14]
mov v2.16b, v15.16b
mov v15.16b, v14.16b
mov v14.16b, v13.16b
mov v13.16b, v12.16b
mov v12.16b, v11.16b
mov v11.16b, v10.16b
mov v10.16b, v9.16b
mov v9.16b, v8.16b
mov v8.16b, v31.16b
mov v31.16b, v30.16b
mov v30.16b, v29.16b
mov v29.16b, v28.16b
mov v28.16b, v27.16b
mov v27.16b, v26.16b
mov v26.16b, v25.16b
mov v25.16b, v24.16b
mov v24.16b, v23.16b
mov v23.16b, v22.16b
mov v22.16b, v21.16b
mov v21.16b, v20.16b
mov v20.16b, v19.16b
mov v19.16b, v18.16b
mov v18.16b, v17.16b
mov v17.16b, v16.16b
mov v16.16b, v7.16b
mov v7.16b, v6.16b
mov v6.16b, v5.16b
mov v5.16b, v4.16b
mov v4.16b, v3.16b
mov v3.16b, v1.16b
mov x12, v0.d[1]
fmov x15, d0
ldp q1, q0, [x14, #16]
ldur x1, [x10, #-256]
ldur x2, [x10, #-192]
add x9, x9, #64 // =64
mov x13, v1.d[1]
fmov x16, d1
ldr q1, [x14, #48]
mul x3, x15, x1
mov x14, v0.d[1]
fmov x17, d0
mov x18, v1.d[1]
fmov x0, d1
mov v1.16b, v3.16b
mov v3.16b, v4.16b
mov v4.16b, v5.16b
mov v5.16b, v6.16b
mov v6.16b, v7.16b
mov v7.16b, v16.16b
mov v16.16b, v17.16b
mov v17.16b, v18.16b
mov v18.16b, v19.16b
mov v19.16b, v20.16b
mov v20.16b, v21.16b
mov v21.16b, v22.16b
mov v22.16b, v23.16b
mov v23.16b, v24.16b
mov v24.16b, v25.16b
mov v25.16b, v26.16b
mov v26.16b, v27.16b
mov v27.16b, v28.16b
mov v28.16b, v29.16b
mov v29.16b, v30.16b
mov v30.16b, v31.16b
mov v31.16b, v8.16b
mov v8.16b, v9.16b
mov v9.16b, v10.16b
mov v10.16b, v11.16b
mov v11.16b, v12.16b
mov v12.16b, v13.16b
mov v13.16b, v14.16b
mov v14.16b, v15.16b
mov v15.16b, v2.16b
ldr q2, [sp] // 16-byte Folded Reload
fmov d0, x3
mul x3, x12, x1
[...]
With -consider-local-interval-cost the same section of code results in the
following:
$ clang --target=aarch64-arm-none-eabi -mllvm -consider-local-interval-cost -O3 -S test.c -o -
[...]
.LBB0_1: // %for.cond1.preheader
// =>This Inner Loop Header: Depth=1
add x14, x11, x9
ldp q0, q1, [x14]
ldur x1, [x10, #-256]
ldur x2, [x10, #-192]
add x9, x9, #64 // =64
mov x12, v0.d[1]
fmov x15, d0
mov x13, v1.d[1]
fmov x16, d1
ldp q0, q1, [x14, #32]
mul x3, x15, x1
cmp x9, #512 // =512
mov x14, v0.d[1]
fmov x17, d0
fmov d0, x3
mul x3, x12, x1
[...]
Reviewers: SjoerdMeijer, samparker, dmgreen, qcolombet
Reviewed By: dmgreen
Subscribers: ZhangKang, jsji, wuzish, ppc-slack, lkail, steven.zhang, MatzeB, qcolombet, kristof.beyls, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69437
|
|
|
|
| |
This reverts commit b7b170c to give the author more time to address failing tests on the expensive checks buildbots.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Without this change, when a nested tag type of any kind (enum, class,
struct, union) is used as a variable type, it is emitted without
emitting the parent type. In CodeView, parent types point to their inner
types, and inner types do not point back to their parents. We already
walk over all of the parent scopes to build the fully qualified name.
This change simply requests their type indices as we go along to enusre
they are all emitted.
Fixes PR43905
Reviewers: akhuang, amccarth
Differential Revision: https://reviews.llvm.org/D69924
|
| |
|
| |
|
| |
|
|
|
|
| |
optimization
|
|
|
|
|
|
| |
This is a partial fix for the issues described in commit message of 027aa27 (the revert of G24609). Unfortunately, I can't provide test coverage for it on it's own as the only (known) wrong example is still wrong, but due to a separate issue.
These fixes are cases where when performing unrelated DAG combines, we were dropping the atomicity flags entirely.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch adds MIR parsing and printing for heap alloc markers, which were
added in D69136. They are printed as an operand similar to pre-/post-instr
symbols, with a heap-alloc-marker token and a metadata node.
Reviewers: rnk
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69864
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
G_GEP is rather poorly named. It's a simple pointer+scalar addition and
doesn't support any of the complexities of getelementptr. I therefore
propose that we rename it. There's a G_PTR_MASK so let's follow that
convention and go with G_PTR_ADD
Reviewers: volkan, aditya_nandakumar, bogner, rovka, arsenm
Subscribers: sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, arphaman, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69734
|
|
|
|
| |
Fixes cppcheck warning.
|
| |
|
|
|
|
| |
NFCI.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
To drive the automaton we used a uint64_t as an action type. This
contained the transition's resource requirements as a conjunction:
(a OR b) AND (b OR c)
We encoded this conjunction as a sequence of four 16-bit bitmasks.
This limited the number of addressable functional units to 16, which
is quite low and has bitten many people in the past.
Instead, the DFAEmitter now generates a lookup table from InstrItinerary
class (index of the ItinData inside the ProcItineraries) to an internal
action index which is essentially a dense embedding of the conjunctive
form. Because we never materialize the conjunctive form, we no longer
have the 16 FU restriction.
In this patch we limit to 64 functional units due to using a uint64_t
bitmask in the DFAEmitter. Now that we've decoupled these representations
we can increase this in future.
Reviewers: ThomasRaoux, kparzysz, majnemer
Reviewed By: ThomasRaoux
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69110
|
| |
|
|
|
|
| |
NFCI.
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds AA to Post-RA Machine Scheduling, allowing the pass more
freedom when handling memory operations.
My understanding is that this was just never done, not that it is
inherently incorrect to do so. The older PostRA List scheduler already
makes use of AA, it's just that the MI PostRA Scheduler was never taught
to use it.
Differential Revision: https://reviews.llvm.org/D69814
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Functions replaceStoreOfFPConstant() and OptimizeFloatStore() both
replace store of float by a store of an integer unconditionally. However
this generates wrong code when the store that is replaced is an indexed
or truncating store. This commit solves this issue by adding an early
return in these functions when the store being considered is not a
normal store.
Bug was only observed on out of tree targets, hence the lack of testcase
in this commit.
Reviewers: efriedma
Subscribers: hiraditya, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D68420
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In the ARM backend, for historical reasons we have only some targets
using Machine Scheduling. The rest use the old list scheduler as they
are using itinaries and the list scheduler seems to produce better code
(and not crash running out of register on v6m codes). So whether to use
the MIScheduler or not is checked at runtime from the subtarget
features.
This is fine, except for post-ra scheduling. Whether to use the old
post-ra list scheduler or the post-ra machine schedule is decided as the
pass manager is set up, in arms case from a newly constructed subtarget.
Under some situations, like LTO, this won't include the correct cpu so
can pick the wrong option. This can have a surprising effect on
performance.
To fix that, this patch overrides targetSchedulesPostRAScheduling and
addPreSched2 in the ARM backend, adding _both_ post-ra schedulers and
picking at runtime which to execute. To pick between the two I've had to
add a enablePostRAMachineScheduler() method that normally returns
enableMachineScheduler() && enablePostRAScheduler(), which can be
overridden to enable just one of PostRAMachineScheduler vs
PostRAScheduler.
Thanks to David Penry for the identifying this problem.
Differential Revision: https://reviews.llvm.org/D69775
|
|
|
|
|
|
|
|
|
| |
With a few things fixed:
- initialisaiton of the optimisation remark pass (this was causing the buildbot
failures on PPC),
- a test case.
Differential Revision: https://reviews.llvm.org/D69660
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
- Define Instruction::Freeze, let it be UnaryOperator
- Add support for freeze to LLLexer/LLParser/BitcodeReader/BitcodeWriter
The format is `%x = freeze <ty> %v`
- Add support for freeze instruction to llvm-c interface.
- Add m_Freeze in PatternMatch.
- Erase freeze when lowering IR to SelDag.
Reviewers: deadalnix, hfinkel, efriedma, lebedev.ri, nlopes, jdoerfert, regehr, filcab, delcypher, whitequark
Reviewed By: lebedev.ri, jdoerfert
Subscribers: jfb, kristof.beyls, hiraditya, lebedev.ri, steven_wu, dexonsmith, xbolva00, delcypher, spatel, regehr, trentxintong, vsk, filcab, nlopes, mehdi_amini, deadalnix, llvm-commits
Differential Revision: https://reviews.llvm.org/D29011
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Continuation of:
D69116
Contributes to a fix for PR43559:
https://bugs.llvm.org/show_bug.cgi?id=43559
See also D69099 and D69116
Use the TLI hook in DAGCombine.cpp to guard against creating
shift nodes that are not optimal for a target.
Patch by: @joanlluch (Joan LLuch)
Differential Revision: https://reviews.llvm.org/D69120
|
|
|
|
|
|
|
|
|
|
|
| |
Small refactoring in visitConstrainedFPIntrinsic that should make
it easier to create DAG nodes requiring extra arguments. That is
the case currently only for STRICT_FP_ROUND, but may be the case
for additional nodes (in particular compares) in the future.
Extracted from the patch for D69281.
NFC.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MachineVerifier::visitMachineFunctionAfter() is extended to check the
live-through case for live-in lists. This is only done for registers without
aliases and that are neither allocatable or reserved, such as the SystemZ::CC
register.
The MachineVerifier earlier only catched the case of a live-in use without
an entry in the live-in list (as "using an undefined physical register").
A comment in LivePhysRegs.h has been added stating a guarantee that
addLiveOuts() can be trusted for a full register both before and after
register allocation.
Review: Quentin Colombet
https://reviews.llvm.org/D68267
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
size isn't power of 2
Summary:
For below test case, we will get assert error except for AArch64 and ARM:
declare i8 @llvm.experimental.vector.reduce.and.i8.v3i8(<3 x i8> %a)
define i8 @test_v3i8(<3 x i8> %a) nounwind {
%b = call i8 @llvm.experimental.vector.reduce.and.i8.v3i8(<3 x i8> %a)
ret i8 %b
}
In the function getShuffleReduction (), we can see it needs the vector size must be power of 2.
This patch is fix below error when the number of element is not power of 2 for those llvm.experimental.vector.reduce.* function.
Reviewed By: jsji
Differential Revision: https://reviews.llvm.org/D68625
|
| |
|
| |
|
|
|
|
|
| |
More code reuse, better basis for modelling
debug_loc/loclists/loclists.dwo emission support.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
== / != 1 -> (setcc) != / == (setcc) to the right place
We need to be checking the value types for the inner setccs not
the outer setcc. We need to ensure those setccs produce a 0/1
value or that the xor is on the i1 type. I think at the time
this code was originally written, getBooleanContents didn't
take any arguments so this was probably correct. But now we can
have a different boolean contents for integer and floating point.
Not sure why the other combines below the xor were also checking
the boolean contents. None of them involve any setccs other than
the outer one and they only produce a new setcc.
Differential Revision: https://reviews.llvm.org/D69480
|
|
|
|
|
|
|
|
|
|
|
|
| |
before checking for begin
If there are debug instructions before the stopping point,
we need to skip over them before checking for begin in order
to avoid having the debug instructions effect behavior.
Fixes PR43758.
Differential Revision: https://reviews.llvm.org/D69606
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The general Function::hasAddressTaken has two issues that make it
inappropriate for our purposes:
1. it is sensitive to dead constant users (PR43858 / crbug.com/1019970),
leading to different codegen when debu info is enabled
2. it considers direct calls via a function cast to be address escapes
The first is fixable, but the second is not, because IPO clients rely on
this behavior. They assume this function means that all call sites are
analyzable for IPO purposes.
So, implement our own analysis, which gets closer to finding functions
that may be indirect call targets.
Reviewers: ajpaverd, efriedma, hans
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69676
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Make sure RAGreedy informs LiveDebugVariables about new VRegs
that is introduced at spill by InlineSpiller.
Consider this example
LDV: !"var" [48r;128r):0 Loc0=%2
48B %2 = ...
...
128B %7 = ADD %2, ...
If %2 is spilled the InlineSpiller will insert spill/reload
instructions and introduces some new vregs. So we get
48B %4 = ...
56B spill %4
...
120B reload %5
128B %3 = ADD %5, ...
In the past we did not inform LDV about this, and when reintroducing
DBG_VALUE instruction LDV still got information that "var" had the
location of the spilled register %2 for the interval [48r;128r).
The result was bad, since we mapped "var" to the spill slot even
before the spill happened:
%4 = ...
DBG_VALUE %spill.0, !"var"
spill %4 to %spill.0
...
reload %5
%3 = ADD %5, ...
This patch will inform LDV about the interval split introduced
due to spilling. So the location map in LDV will become
!"var" [48r;56r):1 [56r;120r):0 [120r;128r):2 Loc0=%2 Loc1=%4 Loc2=%5
And when inserting DBG_VALUE instructions we get
%4 = ...
DBG_VALUE %4, !"var"
spill %4 to %spill.0
DBG_VALUE %spill.0, !"var"
...
reload %5
DBG_VALUE %5, !"var"
%3 = ADD %5, ...
Fixes: https://bugs.llvm.org/show_bug.cgi?id=38899
Reviewers: jmorse, vsk, aprantl
Reviewed By: jmorse
Subscribers: dstenb, wuzish, MatzeB, qcolombet, nemanjai, hiraditya, jsji, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69584
|
|
|
|
|
| |
For AMDGPU this is dependent on the FP mode, which should eventually
not be a property of the subtarget.
|
|
|
|
|
|
|
|
|
|
|
| |
D69580/llvmorg-10-init-8797-g0d987e411ac
Move TargetLoweringBase::isSuitableForJumpTable from
llvm/CodeGen/TargetLowering.h to .cpp, to avoid the undefined reference
from all LLVM${Target}ISelLowering.cpp.
Another fix is to add a dependency on TransformUtils to all
lib/Target/$Target/LLVMBuild.txt, but that is too disruptive.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
(Split of off D67120)
TargetLowering/TargetTransformationInfo/SwitchLoweringUtils changes for profile
guided size optimization.
Reviewers: davidxl
Subscribers: eraman, hiraditya, haicheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D69580
|
|
|
|
|
|
| |
destination and source pair as a return value; NFC
This is breaking MSVC builds: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/20375
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
If a wrapper around one of the mem* stdlib functions bitcasts the returned
pointer value before returning it (e.g. to a wchar_t*), LLVM does not emit a
tail call.
Add a check for this scenario so that we emit a tail call.
Reviewers: wmi, mkuper, ramred01, dmgreen
Reviewed By: wmi, dmgreen
Subscribers: hiraditya, sanwou01, javed.absar, lebedev.ri, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D59078
|
|
|
|
|
|
|
|
|
| |
For AMDGPU this depends on whether denormals are enabled in the
default FP mode for the function. Currently this is treated as a
subtarget feature, so FMAD is selectively legal based on that. I want
to move this out of the subtarget features so this can be controlled
with a denormal mode attribute. Additionally, this will allow folding
based on a future ftz fast math flag.
|