bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label.	Shiva Chen	2018-05-09	22	-39/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to set breakpoints on labels and list source code around labels, we need collect debug information for labels, i.e., label name, the function label belong, line number in the file, and the address label located. In order to keep these information in LLVM IR and to allow backend to generate debug information correctly. We create a new kind of metadata for labels, DILabel. The format of DILabel is !DILabel(scope: !1, name: "foo", file: !2, line: 3) We hope to keep debug information as much as possible even the code is optimized. So, we create a new kind of intrinsic for label metadata to avoid the metadata is eliminated with basic block. The intrinsic will keep existing if we keep it from optimized out. The format of the intrinsic is llvm.dbg.label(metadata !1) It has only one argument, that is the DILabel metadata. The intrinsic will follow the label immediately. Backend could get the label metadata through the intrinsic's parameter. We also create DIBuilder API for labels to be used by Frontend. Frontend could use createLabel() to allocate DILabel objects, and use insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR. Differential Revision: https://reviews.llvm.org/D45024 Patch by Hsiangkai Wang. llvm-svn: 331841
*	[globalisel] Remove redundant -global-isel option from tests that use ↵	Daniel Sanders	2018-05-05	12	-31/+31
\| \| \| \| \| \| \| \| \| \| \|	-run-pass. NFC As Roman Tereshin pointed out in https://reviews.llvm.org/D45541, the -global-isel option is redundant when -run-pass is given. -global-isel sets up the GlobalISel passes in the pass manager but -run-pass skips that entirely and configures it's own pipeline. llvm-svn: 331603
*	ARM: don't try to over-align large vectors as arguments.	Tim Northover	2018-05-03	2	-19/+62
\| \| \| \| \| \| \| \| \| \| \| \|	By default LLVM thinks very large vectors get aligned to their size when passed across functions. Unfortunately no-one told the ARM backend so it doesn't trigger stack realignment and so accesses can cause the usual misalignment issues (e.g. a data abort). This changes the ABI alignment to the stack alignment, which in practice (and as a bonus) also coincides with the alignment "natural" vectors get. llvm-svn: 331451
*	[DAGCombiner] Fix SDLoc in a (zext (zextload x)) combine (4/N)	Vedant Kumar	2018-05-01	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The logic for this combine is almost identical to the logic for a (sext (sextload x)) combine. This commit factors out the logic so it can be shared by both combines, and corrects the SDLoc assigned in the zext version of the combine. Prior to this patch, for the given test case, we would apply the location associated with the udiv instruction to instructions which perform the load. Part of: llvm.org/PR37262 llvm-svn: 331303
*	[DAGCombiner] Fix SDLoc in a (sext (sextload x)) combine (3/N)	Vedant Kumar	2018-05-01	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \|	Prior to this patch, for the given test case, we would apply the location associated with the sdiv instruction to instructions which perform the load. Part of: llvm.org/PR37262. Differential Revision: https://reviews.llvm.org/D46222 llvm-svn: 331302
*	[DAGCombiner] Set the right SDLoc on a newly-created zextload (1/N)	Vedant Kumar	2018-05-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Setting the right SDLoc on a newly-created zextload fixes a line table bug which resulted in non-linear stepping behavior. Several backend tests contained CHECK lines which relied on the IROrder inherited from the wrong SDLoc. This patch breaks that dependence where feasbile and regenerates test cases where not. In some cases, changing a node's IROrder may alter register allocation and spill behavior. This can affect performance. I have chosen not to prevent this by applying a "known good" IROrder to SDLocs, as this may hide a more general bug in the scheduler, or cause regressions on other test inputs. rdar://33755881, Part of: llvm.org/PR37262 Differential Revision: https://reviews.llvm.org/D45995 llvm-svn: 331300
*	[globalisel][legalizerinfo] Add support for legalization based on the ↵	Daniel Sanders	2018-04-27	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MachineMemOperand Summary: Currently only the memory size is supported but others can be added as needed. narrowScalar for G_LOAD and G_STORE now correctly update the MachineMemOperand and will refuse to legalize atomics since those need more careful expansions to maintain atomicity. Reviewers: ab, aditya_nandakumar, bogner, rtereshin, aemerson, javed.absar Reviewed By: aemerson Subscribers: aemerson, rovka, kristof.beyls, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D45466 llvm-svn: 331071
*	[ARM] Codegen for v8.2A dot product intrinsics	Oliver Stannard	2018-04-27	1	-0/+82
\| \| \| \| \| \| \| \| \|	This adds IR intrinsics for the ARM dot-product instructions introduced in v8.2-A. Differential revision: https://reviews.llvm.org/D46106 llvm-svn: 331032
*	[ARM] Enable misched for R52.	David Green	2018-04-27	1	-1/+1
\| \| \| \| \| \| \| \| \|	Back when the R52 schedule was added in rL286949, there was no way to enable machine schedules in ARM for specific cores. Since then a target feature has been added. This enables the feature for R52, removing the need to manually specify compiler flags. llvm-svn: 331027
*	[MIR] Add support for debug metadata for fixed stack objects	Francis Visoiu Mistrih	2018-04-25	2	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Debug var, expr and loc were only supported for non-fixed stack objects. This patch adds the following fields to the "fixedStack:" entries, and renames the ones from "stack:" to: * debug-info-variable * debug-info-expression * debug-info-location Differential Revision: https://reviews.llvm.org/D46032 llvm-svn: 330859
*	[DAGCombine] (float)((int) f) --> ftrunc (PR36617)	Sanjay Patel	2018-04-20	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was originally committed at rL328921 and reverted at rL329920 to investigate failures in Chrome. This time I've added to the ReleaseNotes to warn users of the potential of exposing UB and let me repeat that here for more exposure: Optimization of floating-point casts is improved. This may cause surprising results for code that is relying on undefined behavior. Code sanitizers can be used to detect affected patterns such as this: int main() { float x = 4294967296.0f; x = (float)((int)x); printf("junk in the ftrunc: %f\n", x); return 0; } $ clang -O1 ftrunc.c -fsanitize=undefined ; ./a.out ftrunc.c:5:15: runtime error: 4.29497e+09 is outside the range of representable values of type 'int' junk in the ftrunc: 0.000000 Original commit message: fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC, so replace a pair of casts with the equivalent node. We don't have to account for special cases (NaN, INF) because out-of-range casts are undefined. Differential Revision: https://reviews.llvm.org/D44909 llvm-svn: 330437
*	[ARM] Add some missing FP16 VSEL test cases	Sjoerd Meijer	2018-04-19	1	-8/+83
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D45724 llvm-svn: 330313
*	MachO: trap unreachable instructions	Tim Northover	2018-04-13	2	-1/+11
\| \| \| \| \| \| \|	Debugability is more important than saving 4 bytes to let us to fall through to nonense. llvm-svn: 330073
*	[ARM] FP16 vmaxnm/vminnm scalar instructions	Sjoerd Meijer	2018-04-13	3	-1/+790
\| \| \| \| \| \| \| \| \|	This adds code generation support for the FP16 vmaxnm/vminnm scalar instructions. Differential Revision: https://reviews.llvm.org/D44675 llvm-svn: 330034
*	[NEON] Support intrinsic for scalar and vector versions of the VRINTN ↵	Ivan A. Kosarev	2018-04-13	1	-0/+11
\| \| \| \| \| \| \| \|	instruction Differential Revision: https://reviews.llvm.org/D45514 llvm-svn: 330011
*	revert r328921 - [DAGCombine] (float)((int) f) --> ftrunc (PR36617)	Sanjay Patel	2018-04-12	1	-42/+0
\| \| \| \| \| \| \|	This change is exposing UB in source code - as was warned/predicted. :) See D44909 for discussion. Reverting while we figure out how to fix things. llvm-svn: 329920
*	[FastISel] Disable local value sinking by default	Reid Kleckner	2018-04-11	5	-25/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is causing compilation timeouts on code with long sequences of local values and calls (i.e. foo(1); foo(2); foo(3); ...). It turns out that code coverage instrumentation is a great way to create sequences like this, which how our users ran into the issue in practice. Intel has a tool that detects these kinds of non-linear compile time issues, and Andy Kaylor reported it as PR37010. The current sinking code scans the whole basic block once per local value sink, which happens before emitting each call. In theory, local values should only be introduced to be used by instructions between the current flush point and the last flush point, so we should only need to scan those instructions. llvm-svn: 329822
*	[ARM] FP16 VSEL codegen	Sjoerd Meijer	2018-04-11	1	-12/+142
\| \| \| \| \| \| \| \| \| \| \| \| \|	This is a follow up of rL327695 to instruction select more variants of VSELGT and VSELGE, for which it is necessary to custom lower SELECT. More work is required in this area, which will be addressed soon: - more variants need to be regression tested, but this depends on the next point. - first LowerConstantFP need to be adjusted for fp16 values. Differential Revision: https://reviews.llvm.org/D45205 llvm-svn: 329788
*	[CodeGen] Fix printing bundles in MIR output	Krzysztof Parzyszek	2018-04-10	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Delay printing the newline until after the opening bracket was printed, e.g. BUNDLE implicit-def $r1, implicit-def $r21, implicit $r1 { renamable $r1 = S2_asr_i_r renamable $r1, 1 renamable $r21 = A2_tfrsi 0 } instead of BUNDLE implicit-def $r1, implicit-def $r21, implicit $r1 { renamable $r1 = S2_asr_i_r renamable $r1, 1 renamable $r21 = A2_tfrsi 0 } llvm-svn: 329719
*	[DAGCombine] Improve ReduceLoad for SRL	Sam Parker	2018-04-09	1	-4/+115
\| \| \| \| \| \| \| \| \| \| \| \| \|	Recommitting r329283, third time lucky... If the SRL node is only used by an AND, we may be able to set the ExtVT to the width of the mask, making the AND redundant. To support this, another check has been added in isLegalNarrowLoad which queries whether the load is valid. Differential Revision: https://reviews.llvm.org/D41350 llvm-svn: 329551
*	[DAGCombiner] Fold (zext (and/or/xor (shl/shr (load x), cst), cst))	Guozhi Wei	2018-04-07	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	In our real world application, we found the following optimization is missed in DAGCombiner (zext (and/or/xor (shl/shr (load x), cst), cst)) -> (and/or/xor (shl/shr (zextload x), (zext cst)), (zext cst)) If the user of original zext is an add, it may enable further lea optimization on x86. This patch add a new function CombineZExtLogicopShiftLoad to do this optimization. Differential Revision: https://reviews.llvm.org/D44402 llvm-svn: 329516
*	[DAGCombiner] Add a combine to turn a build vector of zero extends of ↵	Craig Topper	2018-04-07	1	-9/+5
\| \| \| \| \| \|	extract vector elts into a vector zero extend and possibly an extract subvector. llvm-svn: 329509
*	Reapply ARM: Do not spill CSR to stack on entry to noreturn functions	Tim Northover	2018-04-07	2	-1/+66
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Should fix UBSan bot by also checking there's no "uwtable" attribute before skipping. Otherwise the unwind table will be useless since its moves expect CSRs to actually be preserved. A noreturn nounwind function can be expected to never return in any way, and by never returning it will also never have to restore any callee-saved registers for its caller. This makes it possible to skip spills of those registers during function entry, saving some stack space and time in the process. This is rather useful for embedded targets with limited stack space. Should fix PR9970. Patch mostly by myeisha (pmb). llvm-svn: 329494
*	Revert "ARM: Do not spill CSR to stack on entry to noreturn functions"	Vitaly Buka	2018-04-07	2	-52/+1
\| \| \| \| \| \| \| \|	Breaks ubsan test TestCases/Misc/missing_return.cpp on ARM This reverts commit r329287 llvm-svn: 329486
*	ARM: Do not spill CSR to stack on entry to noreturn functions	Tim Northover	2018-04-05	2	-1/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	A noreturn nounwind function can be expected to never return in any way, and by never returning it will also never have to restore any callee-saved registers for its caller. This makes it possible to skip spills of those registers during function entry, saving some stack space and time in the process. This is rather useful for embedded targets with limited stack space. Should fix PR9970. Patch by myeisha (pmb). llvm-svn: 329287
*	[DAGCombine] Revert r329160	Sam Parker	2018-04-05	1	-102/+4
\| \| \| \| \| \|	Again, broke the big endian stage 2 builders. llvm-svn: 329283
*	[DAGCombine] Improve ReduceLoadWidth for SRL	Sam Parker	2018-04-04	1	-4/+102
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recommitting rL321259. Previosuly this caused an issue with PPCBE but I didn't receieve a reproducer and didn't have the time to follow up. If the issue appears again, please provide a reproducer so I can fix it. Original commit message: If the SRL node is only used by an AND, we may be able to set the ExtVT to the width of the mask, making the AND redundant. To support this, another check has been added in isLegalNarrowLoad which queries whether the load is valid. Differential Revision: https://reviews.llvm.org/D41350 llvm-svn: 329160
*	[DAGCombine] (float)((int) f) --> ftrunc (PR36617)	Sanjay Patel	2018-03-31	1	-0/+42
\| \| \| \| \| \| \| \| \| \|	fptosi / fptoui round towards zero, and that's the same behavior as ISD::FTRUNC, so replace a pair of casts with the equivalent node. We don't have to account for special cases (NaN, INF) because out-of-range casts are undefined. Differential Revision: https://reviews.llvm.org/D44909 llvm-svn: 328921
*	[ARM] Support float literals under XO	Christof Douma	2018-03-28	1	-83/+26
\| \| \| \| \| \| \| \| \| \|	Follow up patch of r328313 to support the UseVMOVSR constraint. Removed some unneeded instructions from the test and removed some stray comments. Differential Revision: https://reviews.llvm.org/D44941 llvm-svn: 328691
*	Fix a reoccuring typo in load-combine tests	Artur Pilipenko	2018-03-27	2	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	%tmp = bitcast i32* %arg to i8* %tmp1 = getelementptr inbounds i8, i8* %tmp, i32 0 - %tmp2 = load i8, i8* %tmp, align 1 + %tmp2 = load i8, i8* %tmp1, align 1 This doesn't change the semantics of the tests but makes use of %tmp1 which was originally intended. llvm-svn: 328642
*	Use .set instead of = when printing assignment in assembly output	Krzysztof Parzyszek	2018-03-27	4	-16/+16
\| \| \| \| \| \| \| \| \|	On Hexagon "x = y" is a syntax used in most instructions, and is not treated as a directive. Differential Revision: https://reviews.llvm.org/D44256 llvm-svn: 328635
*	Use local symbols for creating .stack-size.	Rafael Espindola	2018-03-26	1	-2/+4
\| \| \| \|	llvm-svn: 328581
*	[ARM] Support float literals under XO	Christof Douma	2018-03-23	1	-0/+118
\| \| \| \| \| \| \| \| \| \| \| \| \|	When targeting execute-only and fp-armv8, float constants in a compare resulted in instruction selection failures. This is now fixed by using vmov.f32 where possible, otherwise the floating point constant is lowered into a integer constant that is moved into a floating point register. This patch also restores using fpcmp with immediate 0 under fp-armv8. Change-Id: Ie87229706f4ed879a0c0cf66631b6047ed6c6443 llvm-svn: 328313
*	Run dos2unix on a test. NFC.	Rafael Espindola	2018-03-20	1	-30/+30
\| \| \| \|	llvm-svn: 327934
*	[SelectionDAG] Transfer DbgValues when integer operations are promoted	Aaron Smith	2018-03-19	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: DbgValue nodes were not transferred when integer DAG nodes were promoted. For example, if an i32 add node was promoted to an i64 add node by DAGTypeLegalizer::PromoteIntegerResult(), its DbgValue node was not transferred to the new node. The simple fix is to update SetPromotedInteger() to transfer DbgValues. Add AArch64/dbg-value-i8.ll to test this change and fix ARM/debug-info-d16-reg.ll which had the wrong DILocalVariable nodes with arg numbers even though they are not for function parameters. Patch by Se Jong Oh! Reviewers: vsk, JDevlieghere, aprantl Reviewed By: JDevlieghere Subscribers: javed.absar, kristof.beyls, llvm-commits Tags: #debug-info Differential Revision: https://reviews.llvm.org/D44546 llvm-svn: 327919
*	[ARM, AArch64] Check the no-stack-arg-probe attribute for dynamic stack probes	Martin Storsjo	2018-03-19	2	-0/+33
\| \| \| \| \| \| \| \| \| \| \| \| \|	This extends the use of this attribute on ARM and AArch64 from SVN r325900 (where it was only checked for fixed stack allocations on ARM/AArch64, but for all stack allocations on X86). This also adds a testcase for the existing use of disabling the fixed stack probe with the attribute on ARM and AArch64. Differential Revision: https://reviews.llvm.org/D44291 llvm-svn: 327897
*	[ARM] Support for v4f16 and v8f16 vectors	Sjoerd Meijer	2018-03-19	2	-0/+60
\| \| \| \| \| \| \| \| \| \| \| \|	This is the groundwork for adding the Armv8.2-A FP16 vector intrinsics, which uses v4f16 and v8f16 vector operands and return values. All the moving parts are tested with two intrinsics, a 1-operand v8f16 and a 2-operand v4f16 intrinsic. In a follow-up patch the rest of the intrinsics and tests will be added. Differential Revision: https://reviews.llvm.org/D44538 llvm-svn: 327839
*	[ARM] FP16 codegen support for VSEL	Sjoerd Meijer	2018-03-16	1	-1/+39
\| \| \| \| \| \| \| \| \|	This implements lowering of SELECT_CC for f16s, which enables codegen of VSEL with f16 types. Differential Revision: https://reviews.llvm.org/D44518 llvm-svn: 327695
*	[SelectionDAG][ARM][X86] Teach PromoteIntRes_SETCC to do a better job ↵	Craig Topper	2018-03-15	1	-19/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	picking the result type for the setcc. Previously if getSetccResultType returned an illegal type we just fell back to using the default promoted type. This appears to have been to handle the case where for vectors getSetccResultType returns the input type, but the input type itself isn't legal and will need to be promoted. Without the legality check we would never reach a legal type. But just picking the promoted type to be the setcc type can create strange setccs where the result type is 128 bits and the operand type is 256 bits. If for example the result type was promoted to v8i16 from v8i1, but the input type was promoted from v8i23 to v8i32. We currently handle this with custom lowering code in X86. This legality check also caused us reject the getSetccResultType when the input type needed to be widened or split. Even though that result wouldn't have caused legalization to get stuck. This patch tries to fix this by detecting the getSetccResultType needs to be promoted. If its input type also needs to be promoted we'll try a ask for a new setcc result type based on its eventual promoted value. Otherwise we fall back to default type to promote to. For any other illegal values we might get back from the initial call to getSetccResultType we just keep and allow it to be re-legalized later via splitting or widening or scalarizing. llvm-svn: 327683
*	[FastISel] Sink local value materializations to first use	Reid Kleckner	2018-03-14	5	-77/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Local values are constants, global addresses, and stack addresses that can't be folded into the instruction that uses them. For example, when storing the address of a global variable into memory, we need to materialize that address into a register. FastISel doesn't want to materialize any given local value more than once, so it generates all local value materialization code at EmitStartPt, which always dominates the current insertion point. This allows it to maintain a map of local value registers, and it knows that the local value area will always dominate the current insertion point. The downside is that local value instructions are always emitted without a source location. This is done to prevent jumpy line tables, but it means that the local value area will be considered part of the previous statement. Consider this C code: call1(); // line 1 ++global; // line 2 ++global; // line 3 call2(&global, &local); // line 4 Today we end up with assembly and line tables like this: .loc 1 1 callq call1 leaq global(%rip), %rdi leaq local(%rsp), %rsi .loc 1 2 addq $1, global(%rip) .loc 1 3 addq $1, global(%rip) .loc 1 4 callq call2 The LEA instructions in the local value area have no source location and are treated as being on line 1. Stepping through the code in a debugger and correlating it with the assembly won't make much sense, because these materializations are only required for line 4. This is actually problematic for the VS debugger "set next statement" feature, which effectively assumes that there are no registers live across statement boundaries. By sinking the local value code into the statement and fixing up the source location, we can make that feature work. This was filed as https://bugs.llvm.org/show_bug.cgi?id=35975 and https://crbug.com/793819. This change is obviously not enough to make this feature work reliably in all cases, but I felt that it was worth doing anyway because it usually generates smaller, more comprehensible -O0 code. I measured a 0.12% regression in code generation time with LLC on the sqlite3 amalgamation, so I think this is worth doing. There are some special cases worth calling out in the commit message: 1. local values materialized for phis 2. local values used by no-op casts 3. dead local value code Local values can be materialized for phis, and this does not show up as a vreg use in MachineRegisterInfo. In this case, if there are no other uses, this patch sinks the value to the first terminator, EH label, or the end of the BB if nothing else exists. Local values may also be used by no-op casts, which adds the register to the RegFixups table. Without reversing the RegFixups map direction, we don't have enough information to sink these instructions. Lastly, if the local value register has no other uses, we can delete it. This comes up when fastisel tries two instruction selection approaches and the first materializes the value but fails and the second succeeds without using the local value. Reviewers: aprantl, dblaikie, qcolombet, MatzeB, vsk, echristo Subscribers: dotdash, chandlerc, hans, sdardis, amccarth, javed.absar, zturner, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D43093 llvm-svn: 327581
*	[CodeGen] Use MIR syntax for MachineMemOperand printing	Francis Visoiu Mistrih	2018-03-14	4	-6/+6
\| \| \| \| \| \| \| \| \| \|	Get rid of the "; mem:" suffix and use the one we use in MIR: ":: (load 2)". rdar://38163529 Differential Revision: https://reviews.llvm.org/D42377 llvm-svn: 327580
*	SjLjEHPrepare: Don't reg-to-mem swifterror values	Arnold Schwaighofer	2018-03-14	1	-2/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	swifterror llvm values model the swifterror register as memory at the LLVM IR level. ISel will perform adhoc mem-to-reg on them. swifterror values are constraint in how they can be used. Spilling them to memory is not allowed. SjLjEHPrepare tried to lower swifterror values to memory which is unecessary since the back-end will spill and reload the register as neccessary (as long as clobbering calls are marked as such which is the case here) and further leads to invalid IR because swifterror values can't be stored to memory. rdar://38164004 llvm-svn: 327521
*	[ARM] Fix for PR36577	Sjoerd Meijer	2018-03-07	1	-0/+28
\| \| \| \| \| \| \| \| \| \| \| \| \|	Don't PerformSHLSimplify if the given node is used by a node that also uses a constant because we may get stuck in an infinite combine loop. bugzilla: https://bugs.llvm.org/show_bug.cgi?id=36577 Patch by Sam Parker. Differential Revision: https://reviews.llvm.org/D44097 llvm-svn: 326882
*	[ARM] Fix codegen for VLD3/VLD4/VST3/VST4 with WB	Florian Hahn	2018-03-02	4	-0/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Code generation of VLD3, VLD4, VST3 and VST4 with register writeback is broken due to 2 separate bugs: 1) VLD1d64TPseudoWB_register and VLD1d64QPseudoWB_register are missing rules to expand them to non pseudo MIR. These are selected for ARMISD::VLD3_UPD/VLD4_UPD with v1i64 vectors in SelectVLD. 2) Selection of the right VLD/VST instruction is broken for load and store of 3 and 4 v1i64 vectors. SelectVLD and SelectVST are called with MIR opcode for fixed writeback (ie increment is access size) and call getVLDSTRegisterUpdateOpcode() to select an opcode with register writeback if base register update is of a different size. Since getVLDSTRegisterUpdateOpcode() only knows about VLD1/VLD2/VST1/VST2 the call is currently conditional on the number of element in the vector. However, VLD1/VST1 is selected by SelectVLD/SelectVST's caller for load and stores of 3 or 4 v1i64 vectors. Therefore the opcode is not updated which later lead to a fixed writeback instruction being constructed with an extra operand for the register writeback. This patch addresses the two issues as follows: - it adds the necessary mapping from VLD1d64TPseudoWB_register and VLD1d64QPseudoWB_register to VLD1d64Twb_register and VLD1d64Qwb_register respectively. Like for the existing _fixed variants, the cost of these is bumped for unaligned access. - it changes the logic in SelectVLD and SelectVSD to call isVLDfixed and isVSTfixed respectively to decide whether the opcode should be updated. It also reworks the logic and comments for pushing the writeback offset operand and r0 operand to clarify the logic: writeback offset needs to be pushed if it's a register writeback, r0 needs to be pushed if not and the instruction is a VLD1/VLD2/VST1/VST2. Reviewers: rengolin, t.p.northover, samparker Reviewed By: samparker Patch by Thomas Preud'homme <thomas.preudhomme@arm.com> Differential Revision: https://reviews.llvm.org/D42970 llvm-svn: 326570
*	[TLS] use emulated TLS if the target supports only this mode	Chih-Hung Hsieh	2018-02-28	2	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Emulated TLS is enabled by llc flag -emulated-tls, which is passed by clang driver. When llc is called explicitly or from other drivers like LTO, missing -emulated-tls flag would generate wrong TLS code for targets that supports only this mode. Now use useEmulatedTLS() instead of Options.EmulatedTLS to decide whether emulated TLS code should be generated. Unit tests are modified to run with and without the -emulated-tls flag. Differential Revision: https://reviews.llvm.org/D42999 llvm-svn: 326341
*	[ARM] Lower lower saturate to 0 and lower saturate to -1 using bit-operations	Pablo Barrio	2018-02-28	2	-2/+160
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Expressions of the form x < 0 ? 0 : x; and x < -1 ? -1 : x can be lowered using bit-operations instead of branching or conditional moves In thumb-mode this results in a two-instruction sequence, a shift followed by a bic or or while in ARM/thumb2 mode that has flexible second operand the shift can be folded into a single bic/or instructions. In most cases this results in smaller code and possibly less branches, and in no case larger than before. Patch by Martin Svanfeldt Reviewers: fhahn, pbarrio, rogfer01 Reviewed By: pbarrio, rogfer01 Subscribers: chrib, yroux, eugenis, efriedma, rogfer01, aemerson, javed.absar, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D42574 llvm-svn: 326333
*	[ARM] Another f16 litpool fix	Sjoerd Meijer	2018-02-27	1	-0/+113
\| \| \| \| \| \| \| \| \| \| \| \| \|	We were always setting the block alignment to 2 bytes in Thumb mode and 4-bytes in ARM mode (r325754, and r325012), but this could cause reducing the block alignment when it already had been aligned (e.g. in Thumb mode when the block is a CPE that was already 4-byte aligned). Patch by Momchil Velikov, I've only added a test. Differential Revision: https://reviews.llvm.org/D43777 llvm-svn: 326232
*	Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"	Geoff Berry	2018-02-27	3	-8/+9
\| \| \| \| \| \| \| \|	Re-enable commit r323991 now that r325931 has been committed to make MachineOperand::isRenamable() check more conservative w.r.t. code changes and opt-in on a per-target basis. llvm-svn: 326208
*	[CodeGen] Don't omit any redundant information in -debug output	Francis Visoiu Mistrih	2018-02-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In r322867, we introduced IsStandalone when printing MIR in -debug output. The default behaviour for that was: 1) If any of MBB, MI, or MO are -debug-printed separately, don't omit any redundant information. 2) When -debug-printing a MF entirely, don't print any redundant information. 3) When printing MIR, don't print any redundant information. I'd like to change 2) to: 2) When -debug-printing a MF entirely, don't omit any redundant information. Differential Revision: https://reviews.llvm.org/D43337 llvm-svn: 326094
*	Recommit: [ARM] f16 constant pool fix	Sjoerd Meijer	2018-02-22	3	-1/+113
\| \| \| \| \| \| \|	This recommits r325754; the modified and failing test case actually didn't need any modifications. llvm-svn: 325765