bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[ARM] Prefer lsls+lsrs over lsls+ands or lsrs+ands in Thumb1.	Eli Friedman	2018-07-25	1	-0/+81
\| \| \| \| \| \| \| \| \| \| \| \| \|	Saves materializing the immediate for the "ands". Corresponding patterns exist for lsrs+lsls, but that seems less common in practice. Now implemented as a DAGCombine. Differential Revision: https://reviews.llvm.org/D49585 llvm-svn: 337945
*	ARM: stop explicitly marking armv7k libcalls as hard-float. NFC.	Tim Northover	2018-07-18	1	-7/+0
\| \| \| \| \| \| \|	Since the triple's default is hard float, the libcalls will already use VFP registers. llvm-svn: 337386
*	[ARM] Treat cmn immediates as legal in isLegalICmpImmediate.	Eli Friedman	2018-07-10	1	-4/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	The original code attempted to do this, but the std::abs() call didn't actually do anything due to implicit type conversions. Fix the type conversions, and perform the correct check for negative immediates. This probably has very little practical impact, but it's worth fixing just to avoid confusion in the future, I think. Differential Revision: https://reviews.llvm.org/D48907 llvm-svn: 336742
*	[NEON] Fix combining of vldx_dup intrinsics with updating of base addresses	Ivan A. Kosarev	2018-07-05	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	Resolves: Unsupported ARM Neon intrinsics in Target-specific DAG combine function for VLDDUP https://bugs.llvm.org/show_bug.cgi?id=38031 Related diff: D48439 Differential Revision: https://reviews.llvm.org/D48920 llvm-svn: 336325
*	[ARM] Fix PR37382: Don't optimize mul.with.overflow on thumbv6m.	Vadzim Dambrouski	2018-07-02	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: efriedma, rogfer01, javed.absar Reviewed By: efriedma, rogfer01 Subscribers: kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D48846 llvm-svn: 336144
*	[NEON] Support vldNq intrinsics in AArch32 (LLVM part)	Ivan A. Kosarev	2018-06-27	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds support for the q versions of the dup (load-to-all-lanes) NEON intrinsics, such as vld2q_dup_f16() for example. Currently, non-q versions of the dup intrinsics are implemented in clang by generating IR that first loads the elements of the structure into the first lane with the lane (to-single-lane) intrinsics, and then propagating it other lanes. There are at least two problems with this approach. First, there are no double-spaced to-single-lane byte-element instructions. For example, there is no such instruction as 'vld2.8 { d0[0], d2[0] }, [r0]'. That means we cannot rely on the to-single-lane intrinsics and instructions to implement the q versions of the dup intrinsics. Note that to-all-lanes instructions do support all sizes of data items, including bytes. The second problem with the current approach is that we need a separate vdup instruction to propagate the structure to each lane. So for vld4q_dup_f16() we would need four vdup instructions in addition to the initial vld instruction. This patch introduces dup LLVM intrinsics and reworks handling of the currently supported (non-q) NEON dup intrinsics to expand them into those LLVM intrinsics, thus eliminating the need for using to-single-lane intrinsics and instructions. Additionally, this patch adds support for u64 and s64 dup NEON intrinsics. These are marked as Arch64-only in the ARM NEON Reference, but it seems there are no reasons to not support them in AArch32 mode. Please correct, if that is wrong. That's what we generate with this patch applied: vld2q_dup_f16: vld2.16 {d0[], d2[]}, [r0] vld2.16 {d1[], d3[]}, [r0] vld3q_dup_f16: vld3.16 {d0[], d2[], d4[]}, [r0] vld3.16 {d1[], d3[], d5[]}, [r0] vld4q_dup_f16: vld4.16 {d0[], d2[], d4[], d6[]}, [r0] vld4.16 {d1[], d3[], d5[], d7[]}, [r0] Differential Revision: https://reviews.llvm.org/D48439 llvm-svn: 335733
*	[NEON] Support VST1xN intrinsics in AArch32 mode (LLVM part)	Ivan A. Kosarev	2018-06-10	1	-0/+24
\| \| \| \| \| \| \| \| \|	We currently support them only in AArch64. The NEON Reference, however, says they are 'ARMv7, ARMv8' intrinsics. Differential Revision: https://reviews.llvm.org/D47447 llvm-svn: 334361
*	[NEON] Support VLD1xN intrinsics in AArch32 mode (LLVM part)	Ivan A. Kosarev	2018-06-02	1	-0/+18
\| \| \| \| \| \| \| \| \|	We currently support them only in AArch64. The NEON Reference, however, says they are 'ARMv7, ARMv8' intrinsics. Differential Revision: https://reviews.llvm.org/D47120 llvm-svn: 333825
*	Revert r333819 "[NEON] Support VLD1xN intrinsics in AArch32 mode (Clang part)"	Ivan A. Kosarev	2018-06-02	1	-18/+0
\| \| \| \| \| \| \| \|	The LLVM part was committed instead of the Clang part. Differential Revision: https://reviews.llvm.org/D47121 llvm-svn: 333824
*	[NEON] Support VLD1xN intrinsics in AArch32 mode (Clang part)	Ivan A. Kosarev	2018-06-02	1	-0/+18
\| \| \| \| \| \| \| \| \|	We currently support them only in AArch64. The NEON Reference, however, says they are 'ARMv7, ARMv8' intrinsics. Differential Revision: https://reviews.llvm.org/D47121 llvm-svn: 333819
*	[ARM] Remove code handling ADDC/ADDE/SUBC/SUBE	Amaury Sechet	2018-05-30	1	-30/+0
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: This code is now dead as the ARM backend uses ADDCARRY/SUBCARRY/SETCCCARRY . Reviewers: rogfer01, efriedma, rengolin, javed.absar Subscribers: kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D47413 llvm-svn: 333544
*	[ARM] Enable SETCCCARRY lowering for Thumb1.	Eli Friedman	2018-05-29	1	-3/+1
\| \| \| \| \| \| \| \| \|	We've had Thumb1 support for ARMISD::SUBE for a while now, so this just works. Reduces codesize a bit for 64-bit integer comparisons. Differential Revision: https://reviews.llvm.org/D47387 llvm-svn: 333445
*	ARM: be conservative when asked load/store alignment of weird type.	Tim Northover	2018-05-21	1	-0/+4
\| \| \| \| \| \| \|	Chances are we'll be asked again after type legalization, but before that point it's better to claim misaligned accesses aren't allowed than to assert. llvm-svn: 332840
*	Rename DEBUG macro to LLVM_DEBUG.	Nicola Zaghen	2018-05-14	1	-9/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The DEBUG() macro is very generic so it might clash with other projects. The renaming was done as follows: - git grep -l 'DEBUG' \| xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g' - git diff -U0 master \| ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM - Manual change to APInt - Manually chage DOCS as regex doesn't match it. In the transition period the DEBUG() macro is still present and aliased to the LLVM_DEBUG() one. Differential Revision: https://reviews.llvm.org/D43624 llvm-svn: 332240
*	[ARM] Add support for SETCCCARRY instead of SETCCE	Amaury Sechet	2018-05-09	1	-5/+12
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: As per title. SETCCE is deprecated and will eventually be removed. Reviewers: rogfer01, efriedma, rengolin, javed.absar Subscribers: kristof.beyls, chrib, llvm-commits Differential Revision: https://reviews.llvm.org/D46512 llvm-svn: 331929
*	[ARM] Select result 1 from ConvertBooleanCarryToCarryFlag's result ↵	Amaury Sechet	2018-05-07	1	-4/+6
\| \| \| \| \| \| \| \|	automatically. NFC The old behavior return the value 0, which is error prone. llvm-svn: 331614
*	ARM: don't try to over-align large vectors as arguments.	Tim Northover	2018-05-03	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \|	By default LLVM thinks very large vectors get aligned to their size when passed across functions. Unfortunately no-one told the ARM backend so it doesn't trigger stack realignment and so accesses can cause the usual misalignment issues (e.g. a data abort). This changes the ABI alignment to the stack alignment, which in practice (and as a bonus) also coincides with the alignment "natural" vectors get. llvm-svn: 331451
*	Remove \brief commands from doxygen comments.	Adrian Prantl	2018-05-01	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've been running doxygen with the autobrief option for a couple of years now. This makes the \brief markers into our comments redundant. Since they are a visual distraction and we don't want to encourage more \brief markers in new code either, this patch removes them all. Patch produced by for i in $(git grep -l '\\brief'); do perl -pi -e 's/\\brief //g' $i & done Differential Revision: https://reviews.llvm.org/D46290 llvm-svn: 331272
*	[ARM] FP16 vmaxnm/vminnm scalar instructions	Sjoerd Meijer	2018-04-13	1	-0/+5
\| \| \| \| \| \| \| \| \|	This adds code generation support for the FP16 vmaxnm/vminnm scalar instructions. Differential Revision: https://reviews.llvm.org/D44675 llvm-svn: 330034
*	[ARM] FP16 VSEL codegen	Sjoerd Meijer	2018-04-11	1	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	This is a follow up of rL327695 to instruction select more variants of VSELGT and VSELGE, for which it is necessary to custom lower SELECT. More work is required in this area, which will be addressed soon: - more variants need to be regression tested, but this depends on the next point. - first LowerConstantFP need to be adjusted for fp16 values. Differential Revision: https://reviews.llvm.org/D45205 llvm-svn: 329788
*	[IR][CodeGen] Remove dependency on EVT from IR/Function.cpp. Move EVT to ↵	Craig Topper	2018-03-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	CodeGen layer. Currently EVT is in the IR layer only because of Function.cpp needing a very small piece of the functionality of EVT::getEVTString(). The rest of EVT is used in codegen making CodeGen a better place for it. The previous code converted a Type* to EVT and then called getEVTString. This was only expected to handle the primitive types from Type*. Since there only a few primitive types, we can just print them as strings directly. Differential Revision: https://reviews.llvm.org/D45017 llvm-svn: 328806
*	[ARM] Support float literals under XO	Christof Douma	2018-03-28	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Follow up patch of r328313 to support the UseVMOVSR constraint. Removed some unneeded instructions from the test and removed some stray comments. Differential Revision: https://reviews.llvm.org/D44941 llvm-svn: 328691
*	Fix layering by moving ValueTypes.h from CodeGen to IR	David Blaikie	2018-03-23	1	-1/+1
\| \| \| \| \| \|	ValueTypes.h is implemented in IR already. llvm-svn: 328397
*	Fix layering of MachineValueType.h by moving it from CodeGen to Support	David Blaikie	2018-03-23	1	-1/+1
\| \| \| \| \| \| \| \| \|	This is used by llvm tblgen as well as by LLVM Targets, so the only common place is Support for now. (maybe we need another target for these sorts of things - but for now I'm at least making them correct & we can make them better if/when people have strong feelings) llvm-svn: 328395
*	[ARM] Support float literals under XO	Christof Douma	2018-03-23	1	-12/+25
\| \| \| \| \| \| \| \| \| \| \| \| \|	When targeting execute-only and fp-armv8, float constants in a compare resulted in instruction selection failures. This is now fixed by using vmov.f32 where possible, otherwise the floating point constant is lowered into a integer constant that is moved into a floating point register. This patch also restores using fpcmp with immediate 0 under fp-armv8. Change-Id: Ie87229706f4ed879a0c0cf66631b6047ed6c6443 llvm-svn: 328313
*	[ARM, AArch64] Check the no-stack-arg-probe attribute for dynamic stack probes	Martin Storsjo	2018-03-19	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \|	This extends the use of this attribute on ARM and AArch64 from SVN r325900 (where it was only checked for fixed stack allocations on ARM/AArch64, but for all stack allocations on X86). This also adds a testcase for the existing use of disabling the fixed stack probe with the attribute on ARM and AArch64. Differential Revision: https://reviews.llvm.org/D44291 llvm-svn: 327897
*	[ARM] Support for v4f16 and v8f16 vectors	Sjoerd Meijer	2018-03-19	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \|	This is the groundwork for adding the Armv8.2-A FP16 vector intrinsics, which uses v4f16 and v8f16 vector operands and return values. All the moving parts are tested with two intrinsics, a 1-operand v8f16 and a 2-operand v4f16 intrinsic. In a follow-up patch the rest of the intrinsics and tests will be added. Differential Revision: https://reviews.llvm.org/D44538 llvm-svn: 327839
*	[ARM] FP16 codegen support for VSEL	Sjoerd Meijer	2018-03-16	1	-0/+1
\| \| \| \| \| \| \| \| \|	This implements lowering of SELECT_CC for f16s, which enables codegen of VSEL with f16 types. Differential Revision: https://reviews.llvm.org/D44518 llvm-svn: 327695
*	[ARM] Fix for PR36577	Sjoerd Meijer	2018-03-07	1	-8/+17
\| \| \| \| \| \| \| \| \| \| \| \| \|	Don't PerformSHLSimplify if the given node is used by a node that also uses a constant because we may get stuck in an infinite combine loop. bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36577 Patch by Sam Parker. Differential Revision: https://reviews.llvm.org/D44097 llvm-svn: 326882
*	[TLS] use emulated TLS if the target supports only this mode	Chih-Hung Hsieh	2018-02-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Emulated TLS is enabled by llc flag -emulated-tls, which is passed by clang driver. When llc is called explicitly or from other drivers like LTO, missing -emulated-tls flag would generate wrong TLS code for targets that supports only this mode. Now use useEmulatedTLS() instead of Options.EmulatedTLS to decide whether emulated TLS code should be generated. Unit tests are modified to run with and without the -emulated-tls flag. Differential Revision: https://reviews.llvm.org/D42999 llvm-svn: 326341
*	[ARM] Lower lower saturate to 0 and lower saturate to -1 using bit-operations	Pablo Barrio	2018-02-28	1	-0/+61
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Expressions of the form x < 0 ? 0 : x; and x < -1 ? -1 : x can be lowered using bit-operations instead of branching or conditional moves In thumb-mode this results in a two-instruction sequence, a shift followed by a bic or or while in ARM/thumb2 mode that has flexible second operand the shift can be folded into a single bic/or instructions. In most cases this results in smaller code and possibly less branches, and in no case larger than before. Patch by Martin Svanfeldt Reviewers: fhahn, pbarrio, rogfer01 Reviewed By: pbarrio, rogfer01 Subscribers: chrib, yroux, eugenis, efriedma, rogfer01, aemerson, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D42574 llvm-svn: 326333
*	[MachineOperand][Target] MachineOperand::isRenamable semantics changes	Geoff Berry	2018-02-23	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Add a target option AllowRegisterRenaming that is used to opt in to post-register-allocation renaming of registers. This is set to 0 by default, which causes the hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq fields of all opcodes to be set to 1, causing MachineOperand::isRenamable to always return false. Set the AllowRegisterRenaming flag to 1 for all in-tree targets that have lit tests that were effected by enabling COPY forwarding in MachineCopyPropagation (AArch64, AMDGPU, ARM, Hexagon, Mips, PowerPC, RISCV, Sparc, SystemZ and X86). Add some more comments describing the semantics of the MachineOperand::isRenamable function and how it is set and maintained. Change isRenamable to check the operand's opcode hasExtraSrcRegAllocReq/hasExtraDstRegAllocReq bit directly instead of relying on it being consistently reflected in the IsRenamable bit setting. Clear the IsRenamable bit when changing an operand's register value. Remove target code that was clearing the IsRenamable bit when changing registers/opcodes now that this is done conservatively by default. Change setting of hasExtraSrcRegAllocReq in AMDGPU target to be done in one place covering all opcodes that have constant pipe read limit restrictions. Reviewers: qcolombet, MatzeB Subscribers: aemerson, arsenm, jyknight, mcrosier, sdardis, nhaehnle, javed.absar, tpr, arichardson, kristof.beyls, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, jordy.potman.lists, apazos, sabuasal, niosHD, escha, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D43042 llvm-svn: 325931
*	[ARM] Lower BR_CC for f16	Sjoerd Meijer	2018-02-20	1	-2/+1
\| \| \| \| \| \| \| \|	This case wasn't handled yet. Differential Revision: https://reviews.llvm.org/D43508 llvm-svn: 325616
*	[ARM] Materialise some boolean values to avoid a branch	Roger Ferrer Ibanez	2018-02-16	1	-10/+89
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch combines some cases of ARMISD::CMOV for integers that arise in comparisons of the form a != b ? x : 0 a == b ? 0 : x and that currently (e.g. in Thumb1) are emitted as branches. Differential Revision: https://reviews.llvm.org/D34515 llvm-svn: 325323
*	[ARM] Allow 64- and 128-bit types with 't' inline asm constraint	Pablo Barrio	2018-02-15	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: In LLVM, 't' selects a floating-point/SIMD register and only supports 32-bit values. This is appropriately documented in the LLVM Language Reference Manual. However, this behaviour diverges from that of GCC, where 't' selects the s0-s31 registers and its qX and dX variants depending on additional operand modifiers (q/P). For example, the following C code: #include <arm_neon.h> float32x4_t a, b, x; asm("vadd.f32 %0, %1, %2" : "=t" (x) : "t" (a), "t" (b)) results in the following assembly if compiled with GCC: vadd.f32 s0, s0, s1 whereas LLVM will show "error: couldn't allocate output register for constraint 't'", since a, b, x are 128-bit variables, not 32-bit. This patch extends the use of 't' to mean that of GCC, thus allowing selection of the lower Q vector regs and their D/S variants. For example, the earlier code will now compile as: vadd.f32 q0, q0, q1 This behaviour still differs from that of GCC but I think it is actually more correct, since LLVM picks up the right register type based on the datatype of x, while GCC would need an extra operand modifier to achieve the same result, as follows: asm("vadd.f32 %q0, %q1, %q2" : "=t" (x) : "t" (a), "t" (b)) Since this is only an extension of functionality, existing code should not be affected by this change. Note that operand modifiers q/P are already supported by LLVM, so this patch should suffice to support inline assembly with constraint 't' originally built for GCC. Reviewers: grosbach, rengolin Reviewed By: rengolin Subscribers: rogfer01, efriedma, olista01, aemerson, javed.absar, eraman, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D42962 llvm-svn: 325244
*	[ARM] Armv8.2-A FP16 code generation (part 3/3)	Sjoerd Meijer	2018-02-06	1	-0/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds most of the FP16 codegen support, but these areas need further work: - FP16 literals and immediates are not properly supported yet (e.g. literal pool needs work), - Instructions that are generated from intrinsics (e.g. vabs) haven't been added. This will be addressed in follow-up patches. Differential Revision: https://reviews.llvm.org/D42849 llvm-svn: 324321
*	[ARM] FullFP16 LowerReturn Fix	Sjoerd Meijer	2018-02-01	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	Commit r323512 introduced an optimisation in LowerReturn for half-precision return values. A missing check caused a crash when the return value is "undef" (i.e. a node that has no operands). Differential Revision: https://reviews.llvm.org/D42743 llvm-svn: 323968
*	Revert "[ARM] Lower lower saturate to 0 and lower saturate to -1 using ↵	Evgeniy Stepanov	2018-01-31	1	-20/+0
\| \| \| \| \| \| \| \| \| \|	bit-operations" Miscompiles code. Testcase pending. This reverts commit r323869. llvm-svn: 323929
*	[ARM] Lower lower saturate to 0 and lower saturate to -1 using bit-operations	Pablo Barrio	2018-01-31	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Expressions of the form x < 0 ? 0 : x; and x < -1 ? -1 : x can be lowered using bit-operations instead of branching or conditional moves In thumb-mode this results in a two-instruction sequence, a shift followed by a bic or or while in ARM/thumb2 mode that has flexible second operand the shift can be folded into a single bic/or instructions. In most cases this results in smaller code and possibly less branches, and in no case larger than before. Patch by Marten Svanfeldt. Reviewers: fhahn, pbarrio Reviewed By: pbarrio Subscribers: efriedma, rogfer01, aemerson, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D42574 llvm-svn: 323869
*	[ARM] Armv8.2-A FP16 code generation (part 2/3)	Sjoerd Meijer	2018-01-31	1	-31/+73
\| \| \| \| \| \| \| \| \| \| \| \|	Half-precision arguments and return values are passed as if it were an int or float for ARM. This results in truncates and bitcasts to/from i16 and f16 values, which are legalized very early to stack stores/loads. When FullFP16 is enabled, we want to avoid codegen for these bitcasts as it is unnecessary and inefficient. Differential Revision: https://reviews.llvm.org/D42580 llvm-svn: 323861
*	[ARM] Armv8.2-A FP16 code generation (part 1/3)	Sjoerd Meijer	2018-01-26	1	-3/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the groundwork for Armv8.2-A FP16 code generation . Clang passes and returns _Float16 values as floats, together with the required bitconverts and truncs etc. to implement correct AAPCS behaviour, see D42318. We will implement half-precision argument passing/returning lowering in the ARM backend soon, but for now this means that this: _Float16 sub(_Float16 a, _Float16 b) { return a + b; } gets lowered to this: define float @sub(float %a.coerce, float %b.coerce) { entry: %0 = bitcast float %a.coerce to i32 %tmp.0.extract.trunc = trunc i32 %0 to i16 %1 = bitcast i16 %tmp.0.extract.trunc to half <SNIP> %add = fadd half %1, %3 <SNIP> } When FullFP16 is not supported, we don't make f16 a legal type, and we get legalization for "free", i.e. nothing changes and everything works as before. And also f16 argument passing/returning is handled. When FullFP16 is supported, we do make f16 a legal type, and have 2 places that we need to patch up: f16 argument passing and returning, which involves minor tweaks to avoid unnecessary code generation for some bitcasts. As a "demonstrator" that this works for the different FP16, FullFP16, softfp modes, etc., I've added match rules to the VSUB instruction description showing that we can codegen this instruction from IR, but more importantly, also to some conversion instructions. These conversions were causing issue before in the FP16 and FullFP16 cases. I've also added match rules to the VLDRH and VSTRH desriptions, so that we can actually compile the entire half-precision sub code example above. This showed that these loads and stores had the wrong addressing mode specified: AddrMode5 instead of AddrMode5FP16, which turned out not be implemented at all, so that has also been added. This is the minimal patch that shows all the different moving parts. In patch 2/3 I will add some efficient lowering of bitcasts, and in 2/3 I will add the remaining Armv8.2-A FP16 instruction descriptions. Thanks to Sam Parker and Oliver Stannard for their help and reviews! Differential Revision: https://reviews.llvm.org/D38315 llvm-svn: 323512
*	[ARM] Expand long shifts for Thumb1 to __aeabi_ calls	Weiming Zhao	2018-01-24	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: For long shifts, the inlined version takes about 20 instructions on Thumb1. To avoid the code bloat, expand to __aeabi_ calls if target is Thumb1. Reviewers: samparker Reviewed By: samparker Subscribers: samparker, aemerson, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D42401 llvm-svn: 323354
*	[ARM] Call __chkstk for dynamic stack allocation in all windows environments	Martin Storsjo	2018-01-24	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This matches what MSVC does for alloca() function calls on ARM. Even if MSVC doesn't support VLAs at the language level, it does support the alloca function. On the clang level, both the _alloca() (when emulating MSVC, which is what the alloca() function expands to) and __builtin_alloca() builtin functions, and VLAs, map to the same LLVM IR "alloca" function - so within LLVM they're not distinguishable from each other. Differential Revision: https://reviews.llvm.org/D42292 llvm-svn: 323308
*	[ARM] Optimize {s,u}mul.with.overflow.	Joel Galenson	2018-01-17	1	-5/+32
\| \| \| \| \| \| \| \|	This extends my previous patches to also optimize overflow-checked multiplies during SelectionDAG. Differential revision: https://reviews.llvm.org/D40922 llvm-svn: 322738
*	[ARM] Add codegen for SMMULR, SMMLAR and SMMLSR	Andre Vieira	2018-01-12	1	-55/+98
\| \| \| \| \| \| \| \| \|	This patch teaches the Arm back-end to generate the SMMULR, SMMLAR and SMMLSR instructions from equivalent IR patterns. Differential Revision: https://reviews.llvm.org/D41775 llvm-svn: 322361
*	[ARM] Optimize {s,u}{add,sub}.with.overflow.	Joel Galenson	2017-12-20	1	-2/+71
\| \| \| \| \| \| \| \|	The AArch64 backend contains code to optimize {s,u}{add,sub}.with.overflow during SelectionDAG. This commit ports that code to the ARM backend. Differential revision: https://reviews.llvm.org/D35635 llvm-svn: 321224
*	[ARM] Lower unsigned saturation to USAT	Florian Hahn	2017-12-20	1	-7/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Implement lower of unsigned saturation on an interval [0, k] where k + 1 is a power of two using USAT instruction in a similar way to how [~k, k] is lowered using SSAT on ARM models that supports it. Patch by Marten Svanfeldt Reviewers: t.p.northover, pbarrio, eastig, SjoerdMeijer, javed.absar, fhahn Reviewed By: fhahn Subscribers: fhahn, aemerson, javed.absar, llvm-commits, kristof.beyls Differential Revision: https://reviews.llvm.org/D41348 llvm-svn: 321164
*	Silence a bunch of implicit fallthrough warnings	Adrian Prantl	2017-12-19	1	-1/+1
\| \| \| \|	llvm-svn: 321114
*	X86/AArch64/ARM: Factor out common sincos_stret logic; NFCI	Matthias Braun	2017-12-18	1	-17/+7
\| \| \| \| \| \| \| \| \| \| \|	Note: - X86ISelLowering: setLibcallName(SINCOS) was superfluous as InitLibcalls() already does it. - ARMISelLowering: Setting libcallnames for sincos/sincosf seemed superfluous as in the darwin case it wouldn't be used while for all other cases InitLibcalls already does it. llvm-svn: 321036
*	MachineFunction: Return reference from getFunction(); NFC	Matthias Braun	2017-12-15	1	-28/+28
\| \| \| \| \| \|	The Function can never be nullptr so we can return a reference. llvm-svn: 320884