bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[AArch64][x86] add tests for bitcasted fnabs; NFC	Sanjay Patel	2018-10-09	1	-0/+50
\| \| \| \| \| \|	Alternate target coverage for D44548. llvm-svn: 344059
*	[AArch64][v8.5A] Don't create BR instructions in outliner when BTI enabled	Oliver Stannard	2018-10-08	1	-0/+44
\| \| \| \| \| \| \| \| \| \|	When branch target identification is enabled, we can only do indirect tail-calls through x16 or x17. This means that the outliner can't transform a BLR instruction at the end of an outlined region into a BR. Differential revision: https://reviews.llvm.org/D52869 llvm-svn: 343969
*	[AArch64][v8.5A] Restrict indirect tail calls to use x16/17 only when using BTI	Oliver Stannard	2018-10-08	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When branch target identification is enabled, all indirectly-callable functions start with a BTI C instruction. this instruction can only be the target of certain indirect branches (direct branches and fall-through are not affected): - A BLR instruction, in either a protected or unprotected page. - A BR instruction in a protected page, using x16 or x17. - A BR instruction in an unprotected page, using any register. Without BTI, we can use any non call-preserved register to hold the address for an indirect tail call. However, when BTI is enabled, then the code being compiled might be loaded into a BTI-protected page, where only x16 and x17 can be used for indirect tail calls. Legacy code withiout this restriction can still indirectly tail-call BTI-protected functions, because they will be loaded into an unprotected page, so any register is allowed. Differential revision: https://reviews.llvm.org/D52868 llvm-svn: 343968
*	[AArch64][v8.5A] Branch Target Identification code-generation pass	Oliver Stannard	2018-10-08	3	-0/+327
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Branch Target Identification extension, introduced to AArch64 in Armv8.5-A, adds the BTI instruction, which is used to mark valid targets of indirect branches. When enabled, the processor will trap if an instruction in a protected page tries to perform an indirect branch to any instruction other than a BTI. The BTI instruction uses encodings which were NOPs in earlier versions of the architecture, so BTI-enabled code will still run on earlier hardware, just without the extra protection. There are 3 variants of the BTI instruction, which are valid targets for different kinds or branches: - BTI C can be targeted by call instructions, and is inteneded to be used at function entry points. These are the BLR instruction, as well as BR with x16 or x17. These BR instructions are allowed for use in PLT entries, and we can also use them to allow indirect tail-calls. - BTI J can be targeted by BR only, and is intended to be used by jump tables. - BTI JC acts ab both a BTI C and a BTI J instruction, and can be targeted by any BLR or BR instruction. Note that RET instructions are not restricted by branch target identification, the reason for this is that return addresses can be protected more effectively using return address signing. Direct branches and calls are also unaffected, as it is assumed that an attacker cannot modify executable pages (if they could, they wouldn't need to do a ROP/JOP attack). This patch adds a MachineFunctionPass which: - Adds a BTI C at the start of every function which could be indirectly called (either because it is address-taken, or externally visible so could be address-taken in another translation unit). - Adds a BTI J at the start of every basic block which could be indirectly branched to. This could be either done by a jump table, or by taking the address of the block (e.g. the using GCC label values extension). We only need to use BTI JC when a function is indirectly-callable, and takes the address of the entry block. I've not been able to trigger this from C or IR, but I've included a MIR test just in case. Using BTI C at function entries relies on the fact that no other code in BTI-protected pages uses indirect tail-calls, unless they use x16 or x17 to hold the address. I'll add that code-generation restriction as a separate patch. Differential revision: https://reviews.llvm.org/D52867 llvm-svn: 343967
*	[AArch64] Fix verifier error when outlining indirect calls	Oliver Stannard	2018-10-08	1	-3/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The MachineOutliner for AArch64 transforms indirect calls into indirect tail calls, replacing the call with the TCRETURNri pseudo-instruction. This pseudo lowers to a BR, but has the isCall and isReturn flags set. The problem is that TCRETURNri takes a tcGPR64 as the register argument, to prevent indiret tail-calls from using caller-saved registers. The indirect calls transformed by the outliner could use caller-saved registers. This is fine, because the outliner ensures that the register is available at all call sites. However, this causes a verifier failure when the register is not in tcGPR64. The fix is to add a new pseudo-instruction like TCRETURNri, but which accepts any GPR. Differential revision: https://reviews.llvm.org/D52829 llvm-svn: 343959
*	[AARCH64][X86] Remove _nonsplat from test names	Simon Pilgrim	2018-10-07	1	-12/+12
\| \| \| \| \| \|	As discussed on D50222 llvm-svn: 343934
*	[GlobalIsel] Add llvm.invariant.start and llvm.invariant.end	Jessica Paquette	2018-10-05	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \|	Port over the implementation in SelectionDAGBuilder.cpp into the IRTranslator and update the arm64-irtranslator test. These were causing fallbacks in CTMark/Bullet (-Rpass-missed=gisel-select), and this patch fixes that. https://reviews.llvm.org/D52945 llvm-svn: 343885
*	[globalisel][combine] When placing truncates, handle the case when the BB is ↵	Daniel Sanders	2018-10-04	1	-1/+59
\| \| \| \| \| \| \| \| \|	empty GlobalISel uses MIR with implicit fallthrough on each basic block. As a result, getFirstNonPhi() can return end(). llvm-svn: 343829
*	[globalisel][combine] Fix a rare crash when encountering an instruction ↵	Daniel Sanders	2018-10-04	1	-3/+43
\| \| \| \| \| \| \| \| \| \| \| \|	whose op0 isn't a reg The simplest instance of this is an intrinsic with no results which will have the intrinsic ID as operand 0. Also fix some benign incorrectness when op0 is a reg but isn't a def that was guarded against by checking for the extension opcodes. llvm-svn: 343821
*	[globalisel][combine] Improve the truncate placement for the extending-loads ↵	Daniel Sanders	2018-10-04	2	-5/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	combine This brings the extending loads patch back to the original intent but minus the PHI bug and with another small improvement to de-dupe truncates that are inserted into the same block. The truncates are sunk to their uses unless this would require inserting before a phi in which case it sinks to the _beginning_ of the predecessor block for that path (but no earlier than the def). The reason for choosing the beginning of the predecessor is that it makes de-duping multiple truncates in the same block simple, and optimized code is going to run a scheduler at some point which will likely change the position anyway. llvm-svn: 343804
*	AArch64: Fix XSeqPairs/WSeqPairs problems	Matthias Braun	2018-10-04	1	-0/+42
\| \| \| \| \| \| \| \| \| \|	- Fix spill/reloads of XSeqPairs failing with vregs (only physregs worked correctly) - Add missing spill/reload code for WSeqPairs class Differential Revision: https://reviews.llvm.org/D52761 llvm-svn: 343799
*	[globalisel][combines] Don't sink G_TRUNC down to use if that use is a G_PHI	Daniel Sanders	2018-10-03	1	-0/+111
\| \| \| \| \| \| \| \| \| \| \| \|	This fixes a problem where the register allocator fails to eliminate a PHI because there's a non-PHI in the middle of the PHI instructions at the start of a BB. This G_TRUNC can be better placed but this at least fixes the correctness issue quickly. I'll follow up with a patch to the verifier to catch this kind of bug in future. llvm-svn: 343693
*	[globalisel] Fix one more missing Verifier pass from gisel-commandline-option.ll	Daniel Sanders	2018-10-03	1	-0/+1
\| \| \| \|	llvm-svn: 343658
*	Add the missing new files from r343654	Daniel Sanders	2018-10-03	1	-0/+450
\| \| \| \|	llvm-svn: 343655
*	Re-commit: [globalisel] Add a combiner helpers for extending loads and use ↵	Daniel Sanders	2018-10-03	3	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	them in a pre-legalize combiner for AArch64 Summary: Depends on D45541 Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45543 The previous commit failed portions of the test-suite on GreenDragon due to duplicate COPY instructions and iterator invalidation. Both issues have now been fixed. To assist with this, a helper (cloneVirtualRegister) has been added to MachineRegisterInfo that can be used to get another register that has the same type and class/bank as an existing one. llvm-svn: 343654
*	[globalisel] Attempt to fix llvm-clang-x86_64-expensive-checks-win	Daniel Sanders	2018-10-02	1	-10/+21
\| \| \| \| \| \| \| \|	The behaviour of this bot indicates that -verify-machineinstrs has been forced on and is therefore inserting the verifier on builds that don't expect it. Explicitly specify whether it's enabled or disabled for each test. llvm-svn: 343633
*	Revert "X86, AArch64, ARM: Do not attach debug location to spill/reload ↵	Matt Morehouse	2018-10-02	1	-32/+0
\| \| \| \| \| \| \| \|	instructions" This reverts r343520 due to breakage of HWASan tests on Android. llvm-svn: 343616
*	[AArch64][DAGCombiner]: change -stop-after=isel to instruction-select	Fangrui Song	2018-10-02	1	-1/+1
\| \| \| \| \| \| \|	"isel" is registered by AMDGPU. The test will break if the AMDGPU target is not built. llvm-svn: 343553
*	Revert: r343521 and r343541: [globalisel] Add a combiner helpers for ↵	Daniel Sanders	2018-10-01	4	-453/+1
\| \| \| \| \| \| \| \| \| \|	extending loads and use them in a pre-legalize combiner for AArch64 There's a strange assertion on two of the Green Dragon bots that goes away when this is reverted. The assertion is in RegBankAlloc and if it is this commit then -verify-machine-instrs should have caught it earlier in the pipeline. llvm-svn: 343546
*	[globalisel] Add a combiner helpers for extending loads and use them in a ↵	Daniel Sanders	2018-10-01	4	-1/+453
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	pre-legalize combiner for AArch64 Summary: Depends on D45541 Reviewers: ab, aditya_nandakumar, bogner, rtereshin, volkan, rovka, javed.absar, aemerson Subscribers: aemerson, rengolin, mgorny, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D45543 llvm-svn: 343521
*	X86, AArch64, ARM: Do not attach debug location to spill/reload instructions	Matthias Braun	2018-10-01	1	-0/+32
\| \| \| \| \| \| \| \| \| \| \|	Spill/reload instructions are artificially generated by the compiler and have no relation to the original source code. So the best thing to do is not attach any debug location to them (instead of just taking the next debug location we find on following instructions). Differential Revision: https://reviews.llvm.org/D52125 llvm-svn: 343520
*	DAGCombiner: StoreMerging: Fix bad index calculating when adjusting ↵	Matthias Braun	2018-10-01	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mismatching vector types This fixes a case of bad index calculation when merging mismatching vector types. This changes the existing code to just use the existing extract_{subvector\|element} and a bitcast (instead of bitcast first and then newly created extract_xxx) so we don't need to adjust any indices in the first place. rdar://44584718 Differential Revision: https://reviews.llvm.org/D52681 llvm-svn: 343493
*	[NFC][CodeGen][X86][AArch64] Add 64-bit constant bit field extract pattern tests	Roman Lebedev	2018-09-30	1	-0/+50
\| \| \| \|	llvm-svn: 343404
*	[AArch64] Split zero cycle feature more granularly	Evandro Menezes	2018-09-28	2	-42/+180
\| \| \| \| \| \| \| \| \| \|	Split the `zcz` feature into specific ones got GP and FP registers, `zcz-gp` and `zcz-fp`, respectively, while retaining the original feature option to mean both. Differential revision: https://reviews.llvm.org/D52621 llvm-svn: 343354
*	Revert r343317	Luke Cheeseman	2018-09-28	1	-11/+11
\| \| \| \| \| \|	- asan buildbots are breaking and I need to investigate the issue llvm-svn: 343341
*	Reapply changes reverted by r343235	Luke Cheeseman	2018-09-28	1	-11/+11
\| \| \| \| \| \| \| \|	- Add fix so that all code paths that create DWARFContext with an ObjectFile initialise the target architecture in the context - Add an assert that the Arch is known in the Dwarf CallFrameString method llvm-svn: 343317
*	Revert r343192 as an ubsan build is currently failing	Luke Cheeseman	2018-09-27	1	-11/+11
\| \| \| \|	llvm-svn: 343235
*	Reapply changes reverted in r343114, lldb patch to follow shortly	Luke Cheeseman	2018-09-27	1	-11/+11
\| \| \| \|	llvm-svn: 343192
*	Revert r343112 as CallFrameString API change has broken lldb builds	Luke Cheeseman	2018-09-26	1	-11/+11
\| \| \| \|	llvm-svn: 343114
*	[AArch64] - Return address signing dwarf support	Luke Cheeseman	2018-09-26	1	-11/+11
\| \| \| \| \| \|	- Reapply r343089 with a fix for DebugInfo/Sparc/gnu-window-save.ll llvm-svn: 343112
*	[CodeGen] Always print register ties in MI::dump()	Francis Visoiu Mistrih	2018-09-26	1	-6/+6
\| \| \| \| \| \| \| \| \|	It was the case when calling MO::dump(), but MI::dump() was still depending on hasComplexRegisterTies(). The MIR output is not affected. llvm-svn: 343107
*	Revert r343089 "[AArch64] - Return address signing dwarf support"	Hans Wennborg	2018-09-26	1	-11/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This caused the DebugInfo/Sparc/gnu-window-save.ll test to fail. > Functions that have signed return addresses need additional dwarf support: > - After signing the LR, and before authenticating it, the LR register is in a > state the is unusable by a debugger or unwinder > - To account for this a new directive, .cfi_negate_ra_state, is added > - This directive says the signed state of the LR register has now changed, > i.e. unsigned -> signed or signed -> unsigned > - This directive has the same CFA code as the SPARC directive GNU_window_save > (0x2d), adding a macro to account for multiply defined codes > - This patch matches the gcc implementation of this support: > https://patchwork.ozlabs.org/patch/800271/ > > Differential Revision: https://reviews.llvm.org/D50136 llvm-svn: 343103
*	[AArch64] - Return address signing dwarf support	Luke Cheeseman	2018-09-26	1	-11/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Functions that have signed return addresses need additional dwarf support: - After signing the LR, and before authenticating it, the LR register is in a state the is unusable by a debugger or unwinder - To account for this a new directive, .cfi_negate_ra_state, is added - This directive says the signed state of the LR register has now changed, i.e. unsigned -> signed or signed -> unsigned - This directive has the same CFA code as the SPARC directive GNU_window_save (0x2d), adding a macro to account for multiply defined codes - This patch matches the gcc implementation of this support: https://patchwork.ozlabs.org/patch/800271/ Differential Revision: https://reviews.llvm.org/D50136 llvm-svn: 343089
*	Re-submitting changes in D51550 because it failed to patch.	Christy Lee	2018-09-24	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: javed.absar, trentxintong, courbet Reviewed By: trentxintong Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D52433 llvm-svn: 342919
*	[DAGCombiner] use UADDO to optimize saturated unsigned add	Sanjay Patel	2018-09-24	1	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a preliminary step towards solving PR14613: https://bugs.llvm.org/show_bug.cgi?id=14613 If we have an 'add' instruction that sets flags, we can use that to eliminate an explicit compare instruction or some other instruction (cmn) that sets flags for use in the later select. As shown in the unchanged tests that use 'icmp ugt %x, %a', we're effectively reversing an IR icmp canonicalization that replaces a variable operand with a constant: https://rise4fun.com/Alive/V1Q But we're not using 'uaddo' in those cases via DAG transforms. This happens in CGP after D8889 without checking target lowering to see if the op is supported. So AArch already shows 'uaddo' codegen for the i8/i16/i32/i64 test variants with "using_cmp_sum" in the title. That's the pattern that CGP matches as an unsigned saturated add and converts to uaddo without checking target capabilities. This patch is gated by isOperationLegalOrCustom(ISD::UADDO, VT), so we see only see AArch diffs for i32/i64 in the tests with "using_cmp_notval" in the title (unlike x86 which sees improvements for all sizes because all sizes are 'custom'). But the AArch code (like x86) looks better when translated to 'uaddo' in all cases. So someone that is involved with AArch may want to set i8/i16 to 'custom' for UADDO, so this patch will fire on those tests. Another possibility given the existing behavior: we could remove the legal-or-custom check altogether because we're assuming that a UADDO sequence is canonical/optimal before we ever reach here. But that seems like a bug to me. If the target doesn't have an add-with-flags op, then it's not likely that we'll get optimal DAG combining using a UADDO node. This is similar justification for why we don't canonicalize IR to the overflow math intrinsic sibling (llvm.uadd.with.overflow) for UADDO in the first place. Differential Revision: https://reviews.llvm.org/D51929 llvm-svn: 342886
*	[NFC][CodeGen][X86][AArch64] More tests for 'bit field extract' w/ constants	Roman Lebedev	2018-09-24	1	-0/+73
\| \| \| \| \| \| \| \| \| \|	It would be best to introduce ISD::BitFieldExtract, because clearly more than one backend faces the same problem. But for now let's solve this in the x86-specific DAG combine. https://bugs.llvm.org/show_bug.cgi?id=38938 llvm-svn: 342880
*	[AArch64] Support adding X[8-15,18] registers as CSRs.	Tri Vo	2018-09-22	2	-0/+169
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Specifying X[8-15,18] registers as callee-saved is used to support CONFIG_ARM64_LSE_ATOMICS in Linux kernel. As part of this patch we: - use custom CSR list/mask when user specifies custom CSRs - update Machine Register Info's list of CSRs with additional custom CSRs in LowerCall and LowerFormalArguments. Reviewers: srhines, nickdesaulniers, efriedma, javed.absar Reviewed By: nickdesaulniers Subscribers: kristof.beyls, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D52216 llvm-svn: 342824
*	[NFC][x86][AArch64] Add BEXTR-like test patterns.	Roman Lebedev	2018-09-20	1	-0/+717
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Also, adjust the check prefixes so that we actually get to check the BMI1-only-case. Reviewers: craig.topper, RKSimon, spatel, javed.absar Reviewed By: RKSimon Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D48490 llvm-svn: 342623
*	AArch64: Add FuseCryptoEOR fusion rules	Matthias Braun	2018-09-19	1	-0/+75
\| \| \| \| \| \| \| \|	There's some additional rules available on newer apple CPUs. rdar://41235346 llvm-svn: 342590
*	[TargetLowering] Android has sincos functions	John Brawn	2018-09-18	1	-0/+1
\| \| \| \| \| \| \| \| \|	Since Android API version 9 the Android libm has had the sincos functions, so they should be recognised as libcalls and sincos optimisation should be applied. Differential Revision: https://reviews.llvm.org/D52025 llvm-svn: 342471
*	[AArch64] Add integer abs testcases for D51873.	Simon Pilgrim	2018-09-13	1	-0/+52
\| \| \| \|	llvm-svn: 342156
*	[AArch64] Implement aarch64_vector_pcs codegen support.	Sander de Smalen	2018-09-12	1	-0/+253
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds codegen support for the saving/restoring V8-V23 for functions specified with the aarch64_vector_pcs calling convention attribute, as added in patch D51477. Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar, MatzeB Reviewed By: thegameg Differential Revision: https://reviews.llvm.org/D51479 llvm-svn: 342049
*	[MachineOutliner] Add codegen size remarks to the MachineOutliner	Jessica Paquette	2018-09-11	1	-0/+81
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since the outliner is a module pass, it doesn't get codegen size remarks like the other codegen passes do. This adds size remarks to the outliner. This is kind of a workaround, so it's peppered with FIXMEs; size remarks really ought to not ever be handled by the pass itself. However, since the outliner is the only "MachineModulePass", this works for now. Since the entire purpose of the MachineOutliner is to produce code size savings, it really ought to be included in codgen size remarks. If we ever go ahead and make a MachineModulePass (say, something similar to MachineFunctionPass), then all of this ought to be moved there. llvm-svn: 342009
*	[GlobalISel] Lower dbg.declare into indirect DBG_VALUE	Josh Stone	2018-09-11	2	-3/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: D31439 changed the semantics of dbg.declare to take the address of a variable as the first argument, making it indirect. It specifically updated FastISel for this change here: https://reviews.llvm.org/D31439#change-WVArzi177jPl GlobalISel needs to follow suit, or else it will be missing a level of indirection in the generated debuginfo. This problem was seen in a Rust debuginfo test on aarch64, since GlobalISel is used at -O0 for aarch64. https://github.com/rust-lang/rust/issues/49807 https://bugzilla.redhat.com/show_bug.cgi?id=1611597 https://bugzilla.redhat.com/show_bug.cgi?id=1625768 Reviewers: dblaikie, aprantl, t.p.northover, javed.absar, rnk Reviewed By: rnk Subscribers: #debug-info, rovka, kristof.beyls, JDevlieghere, llvm-commits, tstellar Differential Revision: https://reviews.llvm.org/D51749 llvm-svn: 341969
*	[DagCombine][NFC] Some more tests fo for X % C == 0 (UREM case) transform	Roman Lebedev	2018-09-11	2	-0/+406
\| \| \| \| \| \| \| \|	For https://reviews.llvm.org/D50222 Patch by: hermord (Dmytro Shynkevych)! llvm-svn: 341953
*	[AArch64] test codegen for unsigned saturated add; NFC	Sanjay Patel	2018-09-11	1	-0/+729
\| \| \| \| \| \| \|	This is identical to the tests added for x86 at rL341845. A semi-generic DAGCombine should improve things universally. llvm-svn: 341935
*	[AArch64] Support reserving x1-7 registers.	Nick Desaulniers	2018-09-07	2	-0/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Reserving registers x1-7 is used to support CONFIG_ARM64_LSE_ATOMICS in Linux kernel. This change adds support for reserving registers x1 through x7. Reviewers: javed.absar, phosek, srhines, nickdesaulniers, efriedma Reviewed By: nickdesaulniers, efriedma Subscribers: niravd, jfb, manojgupta, nickdesaulniers, jyknight, efriedma, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D48580 llvm-svn: 341706
*	ARM64: improve non-zero memset isel by ~2x	JF Bastien	2018-09-06	1	-67/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: I added a few ARM64 memset codegen tests in r341406 and r341493, and annotated where the generated code was bad. This patch fixes the majority of the issues by requesting that a 2xi64 vector be used for memset of 32 bytes and above. The patch leaves the former request for f128 unchanged, despite f128 materialization being suboptimal: doing otherwise runs into other asserts in isel and makes this patch too broad. This patch hides the issue that was present in bzero_40_stack and bzero_72_stack because the code now generates in a better order which doesn't have the store offset issue. I'm not aware of that issue appearing elsewhere at the moment. <rdar://problem/44157755> Reviewers: t.p.northover, MatzeB, javed.absar Subscribers: eraman, kristof.beyls, chrib, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D51706 llvm-svn: 341558
*	NFC: more memset inline arm64 coverage	JF Bastien	2018-09-05	1	-0/+210
\| \| \| \| \| \|	I'm looking at some codegen optimization in this area and want to make sure I understand the current codegen and don't regress it. This patch further expands the tests (which I already expanded in r341406) to capture more of the current code generation when it comes to stack-based small non-zero memset on arm64. This patch annotates some potential fixes. llvm-svn: 341493
*	[DAGCombiner] try to convert pow(x, 0.25) to sqrt(sqrt(x))	Sanjay Patel	2018-09-05	1	-59/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was proposed as an IR transform in D49306, but it was not clearly justifiable as a canonicalization. Here, we only do the transform when the target tells us that sqrt can be lowered with inline code. This is the basic case. Some potential enhancements are in the TODO comments: 1. Generalize the transform for other exponents (allow more than 2 sqrt calcs if that's really cheaper). 2. If we have less fast-math-flags, generate code to avoid -0.0 and/or INF. 3. Allow the transform when optimizing/minimizing size (might require a target hook to get that right). Note that by default, x86 converts single-precision sqrt calcs into sqrt reciprocal estimate with refinement. That codegen is controlled by CPU attributes and can be manually overridden. We have plenty of test coverage for that already, so I didn't bother to include extra testing for that here. AArch uses its full-precision ops in all cases (not sure if that's the intended behavior or not, but that should also be covered by existing tests). Differential Revision: https://reviews.llvm.org/D51630 llvm-svn: 341481