|  | Commit message (Collapse) | Author | Age | Files | Lines | 
|---|
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | We'd like to remove this whole function, because these are properties of
functions, not the target as a whole. These two are easy to remove
because they are only used for emitting ARM build attributes, which
expects them to represent the defaults for the whole module, not just
the last function generated.
This is needed to get correct build attributes when using IPRA on ARM,
because IPRA causes resetTargetOptions to get called before
ARMAsmPrinter::emitAttributes.
Differential revision: https://reviews.llvm.org/D64929
llvm-svn: 366562 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
According to the new Armv8-M specification
https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf the
instructions SQRSHRL and UQRSHLL now have an additional immediate
operand <saturate>. The new assembly syntax is:
SQRSHRL<c> RdaLo, RdaHi, #<saturate>, Rm
UQRSHLL<c> RdaLo, RdaHi, #<saturate>, Rm
where <saturate> can be either 64 (the existing behavior) or 48, in
that case the result is saturated to 48 bits.
The new operand is encoded as follows:
  #64 Encoded as sat = 0
  #48 Encoded as sat = 1
sat is bit 7 of the instruction bit pattern.
This patch adds a new assembler operand class MveSaturateOperand which
implements parsing and encoding. Decoding is implemented in
DecodeMVEOverlappingLongShift.
Reviewers: ostannard, simon_tatham, t.p.northover, samparker, dmgreen, SjoerdMeijer
Reviewed By: simon_tatham
Subscribers: javed.absar, kristof.beyls, hiraditya, pbarrio, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64810
llvm-svn: 366555 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
Change the scan algorithm to use only power-of-two shifts (1, 2, 4, 8,
16, 32) instead of starting off shifting by 1, 2 and 3 and then doing
a 3-way ADD, because:
1. It simplifies the compiler a little.
2. It minimizes vgpr pressure because each instruction is now of the
   form vn = vn + vn << c.
3. It is more friendly to the DPP combiner, which currently can't
   combine into an ADD3 instruction.
Because of #2 and #3 the end result is improved from this:
  v_add_u32_dpp v4, v3, v3  row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
  v_mov_b32_dpp v5, v3  row_shr:2 row_mask:0xf bank_mask:0xf
  v_mov_b32_dpp v1, v3  row_shr:3 row_mask:0xf bank_mask:0xf
  v_add3_u32 v1, v4, v5, v1
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_shr:4 row_mask:0xf bank_mask:0xe
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_shr:8 row_mask:0xf bank_mask:0xc
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_bcast:15 row_mask:0xa bank_mask:0xf
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_bcast:31 row_mask:0xc bank_mask:0xf
To this:
  v_add_u32_dpp v1, v1, v1  row_shr:1 row_mask:0xf bank_mask:0xf bound_ctrl:0
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_shr:2 row_mask:0xf bank_mask:0xf bound_ctrl:0
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_shr:4 row_mask:0xf bank_mask:0xe
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_shr:8 row_mask:0xf bank_mask:0xc
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_bcast:15 row_mask:0xa bank_mask:0xf
  s_nop 1
  v_add_u32_dpp v1, v1, v1  row_bcast:31 row_mask:0xc bank_mask:0xf
I.e. two fewer computational instructions, one extra nop where we could
schedule something else.
Reviewers: arsenm, sheredom, critson, rampitec, vpykhtin
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64411
llvm-svn: 366543 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | It is necessary to generate fixups in .debug_frame or .eh_frame as
relaxation is enabled due to the address delta may be changed after
relaxation.
There is an opcode with 6-bits data in debug frame encoding. So, we
also need 6-bits fixup types.
Differential Revision: https://reviews.llvm.org/D58335
llvm-svn: 366524 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | and legalize later.
I plan on adding memcpy optimizations in the GlobalISel pipeline, but we can't
do that unless we delay lowering to actual function calls. This patch changes
the translator to generate G_INTRINSIC_W_SIDE_EFFECTS for these functions, and
then have each target specify that using the new custom legalizer for intrinsics
hook that they want it expanded it a libcall.
Differential Revision: https://reviews.llvm.org/D64895
llvm-svn: 366516 | 
| | 
| 
| 
| 
| 
| 
| 
| | This allows to reduce generated AMDGPUGenAsmWriter.inc by ~100Kb.
Differential Revision: https://reviews.llvm.org/D64952
llvm-svn: 366505 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Add support for folding G_GEPs into loads of the form
```
ldr reg, [base, off]
```
when possible. This can save an add before the load. Currently, this is only
supported for loads of 64 bits into 64 bit registers.
Add a new addressing mode function, `selectAddrModeRegisterOffset` which
performs this folding when it is profitable.
Also add a test for addressing modes for G_LOAD.
Differential Revision: https://reviews.llvm.org/D64944
llvm-svn: 366503 | 
| | 
| 
| 
| 
| 
| 
| 
| | This reverts r366441 (git commit 48104ef7c9c653bbb732b66d7254957389fea337)
This causes clang to fail to compile some file in Skia. Reduction soon.
llvm-svn: 366501 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
Properly generate the outchain for the `__builtin_wasm_tls_base` intrinsic.
Also marked the intrinsic pure, per @sunfish's suggestion.
Reviewers: tlively, aheejin, sbc100, sunfish
Reviewed By: tlively
Subscribers: dschuff, jgravelle-google, hiraditya, cfe-commits, llvm-commits, sunfish
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D64949
llvm-svn: 366499 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
Add `__builtin_wasm_tls_base` so that LeakSanitizer can find the thread-local
block and scan through it for memory leaks.
Reviewers: tlively, aheejin, sbc100
Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D64900
llvm-svn: 366475 | 
| | 
| 
| 
| 
| 
| | Differential Revision: https://reviews.llvm.org/D64683
llvm-svn: 366462 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | There doesn't seem to be a practical reason for these instructions to have
different restrictions on the types of relocations that they may be used
with, notwithstanding the language in the ELF AArch64 spec that implies that
specific relocations are meant to be used with specific instructions.
For example, we currently forbid the first instruction in the following
sequence, despite it currently being used by clang to generate a global
reference under -mcmodel=large:
	movz	x0, #:abs_g0_nc:foo
	movk	x0, #:abs_g1_nc:foo
	movk	x0, #:abs_g2_nc:foo
	movk	x0, #:abs_g3:foo
Therefore, allow MOVK/MOVN/MOVZ to accept the union of the set of relocations
that they currently accept individually.
Differential Revision: https://reviews.llvm.org/D64466
llvm-svn: 366461 | 
| | 
| 
| 
| 
| 
| | This reverts commit 17e3cbf5fe656483d9016d0ba9e1d0cd8629379e.
llvm-svn: 366444 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | It is necessary to generate fixups in .debug_frame or .eh_frame as
relaxation is enabled due to the address delta may be changed after
relaxation.
There is an opcode with 6-bits data in debug frame encoding. So, we
also need 6-bits fixup types.
Differential Revision: https://reviews.llvm.org/D58335
llvm-svn: 366442 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This patch enables us to find the source loads for each element, splitting them into a Load and ByteOffset, and attempts to recognise consecutive loads that are in fact from the same source load.
A helper function, findEltLoadSrc, recurses to find a LoadSDNode and determines the element's byte offset within it. When attempting to match consecutive loads, byte offsetted loads then attempt to matched against a previous load that has already been confirmed to be a consecutive match.
Next step towards PR16739 - after this we just need to account for shuffling/repeated elements to create a vector load + shuffle.
Differential Revision: https://reviews.llvm.org/D64551
llvm-svn: 366441 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | LEA doesn't affect flags, so use it more liberally to replace an ADD when
we know that the ADD operands affect flags.
In the motivating example from PR40483:
https://bugs.llvm.org/show_bug.cgi?id=40483
...this lets us avoid duplicating a math op just to avoid flag conflict.
As mentioned in the TODO comments, this heuristic can be extended to
fire more often if that leads to more improvements.
Differential Revision: https://reviews.llvm.org/D64707
llvm-svn: 366431 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
PerformVMOVRRDCombine ommits adding a offset
of 4 to the PointerInfo, when converting a
f64 = load[M]
to
{i32, i32} = {load[M], load[M + 4]}
Which would allow the machine scheduller
to break dependencies with the second load.
 - pr42638
Reviewers: eli.friedman, dmgreen, ostannard
Reviewed By: ostannard
Subscribers: ostannard, javed.absar, kristof.beyls, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64870
llvm-svn: 366423 | 
| | 
| 
| 
| 
| 
| 
| | We insered PHIS were there were none before, so the property must be
reset. This error was found on an EXPENSIVE_CHECKS build.
llvm-svn: 366412 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | I'm not convinced the code this calls is properly vetted for
vXi1 vectors. Experimental vector widening legalization testing
for D55251 is now hitting an assertion failure inside
EltsFromConsecutiveLoads. This is occurring from a v2i1 load
having a store size different than its VT size. Hopefully
this commit will keep such issues from happening.
llvm-svn: 366405 | 
| | 
| 
| 
| 
| 
| | Found by UBSan.
llvm-svn: 366398 | 
| | 
| 
| 
| 
| 
| | Issue found by ASan.
llvm-svn: 366397 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | -DBUILD_SHARED_LIBS=on link error after D64173/r366361
This fixes:
ld.lld: error: undefined symbol: llvm::findAllocaForValue(llvm::Value*, llvm::DenseMap<llvm::Value*, llvm::Alloc aInst*, llvm::DenseMapInfo<llvm::Value*>, llvm::detail::DenseMapPair<llvm::Value*, llvm::AllocaInst*> >&)
>>> referenced by AArch64StackTagging.cpp
llvm-svn: 366396 | 
| | 
| 
| 
| 
| 
| | Differential Revision: https://reviews.llvm.org/D64892
llvm-svn: 366385 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| | min-legal-vector-width=256 is in effect.
This started triggering an assertion after r364718 when we made
these Custom under AVX2.
llvm-svn: 366382 | 
| | 
| 
| 
| 
| 
| | Differential Revision: https://reviews.llvm.org/D64885
llvm-svn: 366376 | 
| | 
| 
| 
| 
| 
| 
| | Depending on the evaluation order of function call arguments,
the current code may insert a use before def.
llvm-svn: 366375 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
Use MTE intrinsics to tag stack variables in functions with
sanitize_memtag attribute.
Reviewers: pcc, vitalybuka, hctim, ostannard
Subscribers: srhines, mgorny, javed.absar, hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64173
llvm-svn: 366361 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Implement IR intrinsics for stack tagging. Generated code is very
unoptimized for now.
Two special intrinsics, llvm.aarch64.irg.sp and llvm.aarch64.tagp are
used to implement a tagged stack frame pointer in a virtual register.
Differential Revision: https://reviews.llvm.org/D64172
llvm-svn: 366360 | 
| | 
| 
| 
| 
| 
| | This reverts r366322 (git commit 4b8da3a503e434ddbc08ecf66582475765f449bc)
llvm-svn: 366355 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
Since the target has no significant advantage of vectorization,
vector instructions bous threshold bonus should be optional.
amdgpu-inline-arg-alloca-cost parameter default value and the target
InliningThresholdMultiplier value tuned then respectively.
Reviewers: arsenm, rampitec
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, eraman, hiraditya, haicheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64642
llvm-svn: 366348 | 
| | 
| 
| 
| 
| 
| | Avoids creating an extra intermediate mov.
llvm-svn: 366340 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | configurable and use for RISC-V
The original behavior was to always emit the offsets to each call site in the
call site table as uleb128 values, however on some architectures (eg RISCV)
these uleb128 offsets into the code cannot always be resolved until link time
(because relaxation will invalidate any calculated offsets), and there are no
appropriate relocations for uleb128 values. As a consequence it needs to be
possible to specify an alternative.
This also switches RISCV to use DW_EH_PE_udata4 for call side encodings in
.gcc_except_table
Differential Revision: https://reviews.llvm.org/D63415
Patch by Edward Jones.
llvm-svn: 366329 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary: Extend the atomic optimizer to handle AND, OR and XOR.
Reviewers: arsenm, sheredom
Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, jfb, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64809
llvm-svn: 366323 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | TME is a future architecture technology, documented in
https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools
https://developer.arm.com/docs/ddi0601/a
More about the future architectures:
https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/new-technologies-for-the-arm-a-profile-architecture
This patch adds support for the TME instructions TSTART, TTEST, TCOMMIT, and
TCANCEL and the target feature/arch extension "tme".
It also implements TME builtin functions, defined in ACLE Q2 2019
(https://developer.arm.com/docs/101028/latest)
Patch by Javed Absar and Momchil Velikov
Differential Revision: https://reviews.llvm.org/D64416
llvm-svn: 366322 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
Missed in the original commit, use the correct callee-saved register
list for spilling, instead of the standard SVR432 list.  This avoids
needlessly spilling the SPE non-volatile registers when they're not used.
As part of this, also add where missing, and sort, the spill opcode
checks for SPE and SPE4 register classes.
Reviewers: nemanjai, hfinkel, joerg
Subscribers: kbarton, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D56703
llvm-svn: 366319 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
Pointed out in a comment for D49754, register spilling will currently
spill SPE registers at almost any offset.  However, the instructions
`evstdd` and `evldd` require a) 8-byte alignment, and b) a limit of 256
(unsigned) bytes from the base register, as the offset must fix into a
5-bit offset, which ranges from 0-31 (indexed in double-words).
The update to the register spill test is taken partially from the test
case shown in D49754.
Additionally, pointed out by Kei Thomsen, globals will currently use
evldd/evstdd, though the offset isn't known at compile time, so may
exceed the 8-bit (unsigned) offset permitted.  This fixes that as well,
by forcing it to always use evlddx/evstddx when accessing globals.
Part of the patch contributed by Kei Thomsen.
Reviewers: nemanjai, hfinkel, joerg
Subscribers: kbarton, jsji, llvm-commits
Differential Revision: https://reviews.llvm.org/D54409
llvm-svn: 366318 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Add narrowScalar to half of original size for G_ICMP.
ClampScalar G_ICMP's operands 2 and 3 to to s32.
Select G_ICMP for pointers for MIPS32. Pointer compare is same
as for integers, it is enough to declare them as legal type.
Differential Revision: https://reviews.llvm.org/D64856
llvm-svn: 366317 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary: Change-Id: I854fbf7d48e937bef9f8f3f5d0c8aeb970652630
Reviewers: rampitec, mareko
Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64807
Change-Id: I4405b3a7f84186acea5a78d291bff71056e745fc
llvm-svn: 366314 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary: GDS cannot alias anything else.
Original patch by: Marek Olšák
Reviewers: arsenm, mareko
Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, Petar.Avramovic, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64114
Change-Id: I07bfbd96f5d5c37a6dfba7997df12f291dd794b0
llvm-svn: 366313 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| | Migrate CallLowering::lowerReturnVal to use the same infrastructure as
lowerCall/FormalArguments and remove the now obsolete code path from
splitToValueTypes.
Forgot to push this earlier.
llvm-svn: 366308 | 
| | 
| 
| 
| 
| 
| | The `MUL` instruction is available starting from the MIPS32/MIPS64 targets.
llvm-svn: 366301 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This directive forces to use the alternate register for context pointer.
For example, this code:
  .cplocal $4
  jal foo
expands to:
  ld    $25, %call16(foo)($4)
  jalr  $25
Differential Revision: https://reviews.llvm.org/D64743
llvm-svn: 366300 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | As well as other LLVM targets we do not handle "offsettable"
memory addresses in any special way. In other words, the "o" constraint
is an exact equivalent of the "m" one. But some existing code require
the "o" constraint support.
This fixes PR42589.
Differential Revision: https://reviews.llvm.org/D64792
llvm-svn: 366299 | 
| | 
| 
| 
| 
| 
| | Differential Revision: https://reviews.llvm.org/D64839
llvm-svn: 366283 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
Currently, on Emscripten, dynamic linking is not supported with threads.
This means that if thread-local storage is used, it must be used in a
statically-linked executable. Hence, local-exec is the only possible model.
This diff compiles all TLS variables to use local-exec on Emscripten as a
temporary measure until dynamic linking is supported with threads.
The goal for this is to allow C++ types with constructors to be thread-local.
Currently, when `clang` compiles a `thread_local` variable with a constructor,
it generates `__tls_guard` variable:
    @__tls_guard = internal thread_local global i8 0, align 1
As no TLS model is specified, this is treated as general-dynamic, which we do
not support (and cannot support without implementing dynamic linking support
with threads in Emscripten). As a result, any C++ constructor in `thread_local`
variables would not compile.
By compiling all `thread_local` as local-exec, `__tls_guard` will compile and
we can support C++ constructors with TLS without implementing dynamic linking
with threads.
Depends on D64537
Reviewers: tlively, aheejin, sbc100
Reviewed By: aheejin
Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64776
llvm-svn: 366275 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
Thread local variables are placed inside a `.tdata` segment. Their symbols are
offsets from the start of the segment. The address of a thread local variable
is computed as `__tls_base` + the offset from the start of the segment.
`.tdata` segment is a passive segment and `memory.init` is used once per thread
to initialize the thread local storage.
`__tls_base` is a wasm global. Since each thread has its own wasm instance,
it is effectively thread local. Currently, `__tls_base` must be initialized
at thread startup, and so cannot be used with dynamic libraries.
`__tls_base` is to be initialized with a new linker-synthesized function,
`__wasm_init_tls`, which takes as an argument a block of memory to use as the
storage for thread locals. It then initializes the block of memory and sets
`__tls_base`. As `__wasm_init_tls` will handle the memory initialization,
the memory does not have to be zeroed.
To help allocating memory for thread-local storage, a new compiler intrinsic
is introduced: `__builtin_wasm_tls_size()`. This instrinsic function returns
the size of the thread-local storage for the current function.
The expected usage is to run something like the following upon thread startup:
    __wasm_init_tls(malloc(__builtin_wasm_tls_size()));
Reviewers: tlively, aheejin, kripken, sbc100
Subscribers: dschuff, jgravelle-google, hiraditya, sunfish, jfb, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D64537
llvm-svn: 366272 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | This is part of what is requested by PR42023:
https://bugs.llvm.org/show_bug.cgi?id=42023
There's an extension needed for FP add, but exactly how we would specify
that using flags is not clear to me, so I left that as a TODO.
We're still missing patterns for partial reductions when the input vector
is 256-bit or 512-bit, but I think that's a failure of vector narrowing.
If we can reduce the widths, then this matching should work on those tests.
Differential Revision: https://reviews.llvm.org/D64760
llvm-svn: 366268 | 
| | 
| 
| 
| | llvm-svn: 366257 | 
| | 
| 
| 
| | llvm-svn: 366256 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| | Summary:
This is exposed by our internal testing.
The reduced testcase will assert with "Impossible reg-to-reg copy"
We can't use COPY to do 32-bit to 64-bit conversion.
Reviewers: kbarton, hfinkel, nemanjai
Reviewed By: hfinkel
Subscribers: hiraditya, MaskRay, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64499
llvm-svn: 366255 |