bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[TargetLowering][DAGCombine][MSP430] Shift Amount Threshold in DAGCombine (3)	joanlluch	2019-11-11	1	-6/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Additional filtering of undesired shifts for targets that do not support them efficiently. Related with D69116 and D69120 Applies the TLI.getShiftAmountThreshold hook to prevent undesired generation of shifts for the following IR code: ``` define i16 @testShiftBits(i16 %a) { entry: %and = and i16 %a, -64 %cmp = icmp eq i16 %and, 64 %conv = zext i1 %cmp to i16 ret i16 %conv } define i16 @testShiftBits_11(i16 %a) { entry: %cmp = icmp ugt i16 %a, 63 %conv = zext i1 %cmp to i16 ret i16 %conv } define i16 @testShiftBits_12(i16 %a) { entry: %cmp = icmp ult i16 %a, 64 %conv = zext i1 %cmp to i16 ret i16 %conv } ``` The attached diff file shows the piece code in TargetLowering that is responsible for the generation of shifts in relation to the IR above. Before applying this patch, shifts will be generated to replace non-legal icmp immediates. However, shifts may be undesired if they are even more expensive for the target. For all my previous patches in this series (cited above) I added test cases for the MSP430 target. However, in this case, the target is not suitable for showing improvements related with this patch, because the MSP430 does not implement "isLegalICmpImmediate". The default implementation returns always true, therefore the patched code in TargetLowering is never reached for that target. Targets implementing both "isLegalICmpImmediate" and "getShiftAmountThreshold" will benefit from this. The differential effect of this patch can only be shown for the MSP430 by temporarily implementing "isLegalICmpImmediate" to return false for large immediates. This is simulated with the implementation of a command line flag that was incorporated in D69975 This patch belongs to a initiative to "relax" the generation of shifts by LLVM for targets requiring it Reviewers: spatel, lebedev.ri, asl Reviewed By: spatel Subscribers: lenary, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69326
*	RegisterCoalescer - remove duplicate variable to fix Wshadow warning. NFCI.	Simon Pilgrim	2019-11-09	1	-3/+2
\|
*	RegisterCoalescer - fix uninitialized variables. NFCI.	Simon Pilgrim	2019-11-09	1	-10/+10
\|
*	Fix operator precedence warning. NFC.	Simon Pilgrim	2019-11-09	1	-1/+1
\|
*	DebugInfo: Remove redundant conditionals/checks from macro info emission	David Blaikie	2019-11-08	1	-18/+8
\| \| \| \| \|	These checks fall out naturally from the current implementation without needing to be explicitly considered anymore.
*	DebugInfo: Do not create a debug_macinfo section if no CUs have associated ↵	David Blaikie	2019-11-08	1	-4/+2
\| \| \| \| \| \|	macros Patch based on Sourabh Singh's D69839 patch.
*	NVPTX: Don't insert an extra empty line at the end of the last section.	David Blaikie	2019-11-08	1	-2/+0
\| \| \| \| \| \| \|	This was arbitrarily appearing in only the last section emitted - which made tests more sensitive than they needed to be (removing the last section - like the macinfo section change that's coming after this) would, surprisingly, move the blank line to the previous section.
*	DebugInfo: Use separate macinfo contributions for each CU	David Blaikie	2019-11-08	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The macinfo support was broken for LTO situations, by terminating macinfo lists only once - multiple macinfo contributions were correctly labeled, but they all continued/flowed into later contributions until only one terminator appeared at the end of the section. Correctly terminate each contribution & fix the parsing to handle this situation too. The parsing fix is also necessary for dumping linked binaries - the previous code would stop at the end of the first contribution - missing all later contributions in a linked binary. It'd be nice to improve the dumping to print the offsets of each contribution so it'd be easier to know which CU AT_macro_info refers to which macinfo contribution.
*	[AArch64][X86] Don't assume __powidf2 is available on Windows.	Eli Friedman	2019-11-08	2	-8/+35
\| \| \| \| \| \| \| \| \| \|	We had some code for this for 32-bit ARM, but this doesn't really need to be in target-specific code; generalize it. (I think this started showing up recently because we added an optimization that converts pow to powi.) Differential Revision: https://reviews.llvm.org/D69013
*	Reland: [TII] Use optional destination and source pair as a return value; NFC	Djordje Todorovic	2019-11-08	2	-8/+11
\| \| \| \| \| \| \| \| \| \|	Refactor usage of isCopyInstrImpl, isCopyInstr and isAddImmediate methods to return optional machine operand pair of destination and source registers. Patch by Nikola Prica Differential Revision: https://reviews.llvm.org/D69622
*	Revert d91ed80 "[codeview] Reference types in type parent scopes"	Hans Wennborg	2019-11-08	2	-35/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This triggered asserts in the Chromium build, see https://crbug.com/1022729 for details and reproducer. > Without this change, when a nested tag type of any kind (enum, class, > struct, union) is used as a variable type, it is emitted without > emitting the parent type. In CodeView, parent types point to their inner > types, and inner types do not point back to their parents. We already > walk over all of the parent scopes to build the fully qualified name. > This change simply requests their type indices as we go along to enusre > they are all emitted. > > Fixes PR43905 > > Reviewers: akhuang, amccarth > > Differential Revision: https://reviews.llvm.org/D69924
*	[RAGreedy] Enable -consider-local-interval-cost for AArch64	Sanne Wouda	2019-11-08	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The greedy register allocator occasionally decides to insert a large number of unnecessary copies, see below for an example. The -consider-local-interval-cost option (which X86 already enables by default) fixes this. We enable this option for AArch64 only after receiving feedback that this change is not beneficial for PowerPC. We evaluated the impact of this change on compile time, code size and performance benchmarks. This option has a small impact on compile time, measured on CTMark. A 0.1% geomean regression on -O1 and -O2, and 0.2% geomean for -O3, with at most 0.5% on individual benchmarks. The effect on both code size and performance on AArch64 for the LLVM test suite is nil on the geomean with individual outliers (ignoring short exec_times) between: best worst size..text -3.3% +0.0% exec_time -5.8% +2.3% On SPEC CPU® 2017 (compiled for AArch64) there is a minor reduction (-0.2% at most) in code size on some benchmarks, with a tiny movement (-0.01%) on the geomean. Neither intrate nor fprate show any change in performance. This patch makes the following changes. - For the AArch64 target, enableAdvancedRASplitCost() now returns true. - Ensures that -consider-local-interval-cost=false can disable the new behaviour if necessary. This matrix multiply example: $ cat test.c long A[8][8]; long B[8][8]; long C[8][8]; void run_test() { for (int k = 0; k < 8; k++) { for (int i = 0; i < 8; i++) { for (int j = 0; j < 8; j++) { C[i][j] += A[i][k] * B[k][j]; } } } } results in the following generated code on AArch64: $ clang --target=aarch64-arm-none-eabi -O3 -S test.c -o - [...] // %for.cond1.preheader // =>This Inner Loop Header: Depth=1 add x14, x11, x9 str q0, [sp, #16] // 16-byte Folded Spill ldr q0, [x14] mov v2.16b, v15.16b mov v15.16b, v14.16b mov v14.16b, v13.16b mov v13.16b, v12.16b mov v12.16b, v11.16b mov v11.16b, v10.16b mov v10.16b, v9.16b mov v9.16b, v8.16b mov v8.16b, v31.16b mov v31.16b, v30.16b mov v30.16b, v29.16b mov v29.16b, v28.16b mov v28.16b, v27.16b mov v27.16b, v26.16b mov v26.16b, v25.16b mov v25.16b, v24.16b mov v24.16b, v23.16b mov v23.16b, v22.16b mov v22.16b, v21.16b mov v21.16b, v20.16b mov v20.16b, v19.16b mov v19.16b, v18.16b mov v18.16b, v17.16b mov v17.16b, v16.16b mov v16.16b, v7.16b mov v7.16b, v6.16b mov v6.16b, v5.16b mov v5.16b, v4.16b mov v4.16b, v3.16b mov v3.16b, v1.16b mov x12, v0.d[1] fmov x15, d0 ldp q1, q0, [x14, #16] ldur x1, [x10, #-256] ldur x2, [x10, #-192] add x9, x9, #64 // =64 mov x13, v1.d[1] fmov x16, d1 ldr q1, [x14, #48] mul x3, x15, x1 mov x14, v0.d[1] fmov x17, d0 mov x18, v1.d[1] fmov x0, d1 mov v1.16b, v3.16b mov v3.16b, v4.16b mov v4.16b, v5.16b mov v5.16b, v6.16b mov v6.16b, v7.16b mov v7.16b, v16.16b mov v16.16b, v17.16b mov v17.16b, v18.16b mov v18.16b, v19.16b mov v19.16b, v20.16b mov v20.16b, v21.16b mov v21.16b, v22.16b mov v22.16b, v23.16b mov v23.16b, v24.16b mov v24.16b, v25.16b mov v25.16b, v26.16b mov v26.16b, v27.16b mov v27.16b, v28.16b mov v28.16b, v29.16b mov v29.16b, v30.16b mov v30.16b, v31.16b mov v31.16b, v8.16b mov v8.16b, v9.16b mov v9.16b, v10.16b mov v10.16b, v11.16b mov v11.16b, v12.16b mov v12.16b, v13.16b mov v13.16b, v14.16b mov v14.16b, v15.16b mov v15.16b, v2.16b ldr q2, [sp] // 16-byte Folded Reload fmov d0, x3 mul x3, x12, x1 [...] With -consider-local-interval-cost the same section of code results in the following: $ clang --target=aarch64-arm-none-eabi -mllvm -consider-local-interval-cost -O3 -S test.c -o - [...] .LBB0_1: // %for.cond1.preheader // =>This Inner Loop Header: Depth=1 add x14, x11, x9 ldp q0, q1, [x14] ldur x1, [x10, #-256] ldur x2, [x10, #-192] add x9, x9, #64 // =64 mov x12, v0.d[1] fmov x15, d0 mov x13, v1.d[1] fmov x16, d1 ldp q0, q1, [x14, #32] mul x3, x15, x1 cmp x9, #512 // =512 mov x14, v0.d[1] fmov x17, d0 fmov d0, x3 mul x3, x12, x1 [...] Reviewers: SjoerdMeijer, samparker, dmgreen, qcolombet Reviewed By: dmgreen Subscribers: ZhangKang, jsji, wuzish, ppc-slack, lkail, steven.zhang, MatzeB, qcolombet, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69437
*	Revert "[MachineVerifier] Improve verification of live-in lists.	Galina Kistanova	2019-11-07	1	-26/+0
\| \| \| \|	This reverts commit b7b170c to give the author more time to address failing tests on the expensive checks buildbots.
*	[codeview] Reference types in type parent scopes	Reid Kleckner	2019-11-07	2	-14/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Without this change, when a nested tag type of any kind (enum, class, struct, union) is used as a variable type, it is emitted without emitting the parent type. In CodeView, parent types point to their inner types, and inner types do not point back to their parents. We already walk over all of the parent scopes to build the fully qualified name. This change simply requests their type indices as we go along to enusre they are all emitted. Fixes PR43905 Reviewers: akhuang, amccarth Differential Revision: https://reviews.llvm.org/D69924
*	PostRAScheduler - fix uninitialized variable warning. NFCI.	Simon Pilgrim	2019-11-07	1	-1/+1
\|
*	[SDAG] reduce code duplication; NFC	Sanjay Patel	2019-11-07	1	-18/+11
\|
*	[SDAG] reduce code duplication; NFC	Sanjay Patel	2019-11-07	1	-4/+4
\|
*	[Analysis] Attribute deref/deref_or_null should not prevent tail call ↵	Dávid Bolvanský	2019-11-06	1	-1/+5
\| \| \| \|	optimization
*	[X86/Atomics] Correct a few transforms for new atomic lowering	Philip Reames	2019-11-05	1	-4/+3
\| \| \| \| \| \|	This is a partial fix for the issues described in commit message of 027aa27 (the revert of G24609). Unfortunately, I can't provide test coverage for it on it's own as the only (known) wrong example is still wrong, but due to a separate issue. These fixes are cases where when performing unrelated DAG combines, we were dropping the atomicity flags entirely.
*	[MIR] Add MIR parsing for heap alloc site instruction markers	Amy Huang	2019-11-05	5	-5/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch adds MIR parsing and printing for heap alloc markers, which were added in D69136. They are printed as an operand similar to pre-/post-instr symbols, with a heap-alloc-marker token and a metadata node. Reviewers: rnk Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69864
*	[globalisel] Rename G_GEP to G_PTR_ADD	Daniel Sanders	2019-11-05	8	-38/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: G_GEP is rather poorly named. It's a simple pointer+scalar addition and doesn't support any of the complexities of getelementptr. I therefore propose that we rename it. There's a G_PTR_MASK so let's follow that convention and go with G_PTR_ADD Reviewers: volkan, aditya_nandakumar, bogner, rovka, arsenm Subscribers: sdardis, jvesely, wdng, nhaehnle, hiraditya, jrtc27, atanasyan, arphaman, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69734
*	Remove redundant assignment. NFCI.	Simon Pilgrim	2019-11-05	1	-1/+0
\| \| \| \|	Fixes cppcheck warning.
*	Use iterator prefix increment. NFCI.	Simon Pilgrim	2019-11-05	1	-1/+1
\|
*	[MachineOutliner] Reduce scope of variable and stop duplicate getMF() calls. ↵	Simon Pilgrim	2019-11-05	1	-3/+3
\| \| \| \|	NFCI.
*	[DFAPacketizer] Allow up to 64 functional units	jmolloy	2019-11-05	1	-54/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: To drive the automaton we used a uint64_t as an action type. This contained the transition's resource requirements as a conjunction: (a OR b) AND (b OR c) We encoded this conjunction as a sequence of four 16-bit bitmasks. This limited the number of addressable functional units to 16, which is quite low and has bitten many people in the past. Instead, the DFAEmitter now generates a lookup table from InstrItinerary class (index of the ItinData inside the ProcItineraries) to an internal action index which is essentially a dense embedding of the conjunctive form. Because we never materialize the conjunctive form, we no longer have the 16 FU restriction. In this patch we limit to 64 functional units due to using a uint64_t bitmask in the DFAEmitter. Now that we've decoupled these representations we can increase this in future. Reviewers: ThomasRaoux, kparzysz, majnemer Reviewed By: ThomasRaoux Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69110
*	[MachineOutliner] Fix uninitialized variable warnings. NFCI.	Simon Pilgrim	2019-11-05	1	-1/+1
\|
*	[AtomicExpandPass] Silence static analyzer warnings about operator priority. ↵	Dávid Bolvanský	2019-11-05	1	-1/+1
\| \| \| \|	NFCI.
*	[MachineScheduler] Enable AA in PostRA Machine scheduler	David Green	2019-11-05	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \|	This adds AA to Post-RA Machine Scheduling, allowing the pass more freedom when handling memory operations. My understanding is that this was just never done, not that it is inherently incorrect to do so. The older PostRA List scheduler already makes use of AA, it's just that the MI PostRA Scheduler was never taught to use it. Differential Revision: https://reviews.llvm.org/D69814
*	Fix PR40644: miscompile indexed FP constant store	Thomas Preud'homme	2019-11-05	2	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Functions replaceStoreOfFPConstant() and OptimizeFloatStore() both replace store of float by a store of an integer unconditionally. However this generates wrong code when the store that is replaced is an indexed or truncating store. This commit solves this issue by adding an early return in these functions when the store being considered is not a normal store. Bug was only observed on out of tree targets, hence the lack of testcase in this commit. Reviewers: efriedma Subscribers: hiraditya, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68420
*	[Scheduling][ARM] Consistently enable PostRA Machine scheduling	David Green	2019-11-05	2	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the ARM backend, for historical reasons we have only some targets using Machine Scheduling. The rest use the old list scheduler as they are using itinaries and the list scheduler seems to produce better code (and not crash running out of register on v6m codes). So whether to use the MIScheduler or not is checked at runtime from the subtarget features. This is fine, except for post-ra scheduling. Whether to use the old post-ra list scheduler or the post-ra machine schedule is decided as the pass manager is set up, in arms case from a newly constructed subtarget. Under some situations, like LTO, this won't include the correct cpu so can pick the wrong option. This can have a surprising effect on performance. To fix that, this patch overrides targetSchedulesPostRAScheduling and addPreSched2 in the ARM backend, adding _both_ post-ra schedulers and picking at runtime which to execute. To pick between the two I've had to add a enablePostRAMachineScheduler() method that normally returns enableMachineScheduler() && enablePostRAScheduler(), which can be overridden to enable just one of PostRAMachineScheduler vs PostRAScheduler. Thanks to David Penry for the identifying this problem. Differential Revision: https://reviews.llvm.org/D69775
*	Recommit "[HardwareLoops] Optimisation remarks"	Sjoerd Meijer	2019-11-05	1	-23/+82
\| \| \| \| \| \| \| \| \|	With a few things fixed: - initialisaiton of the optimisation remark pass (this was causing the buildbot failures on PPC), - a test case. Differential Revision: https://reviews.llvm.org/D69660
*	[IR] Add Freeze instruction	aqjune	2019-11-05	3	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: - Define Instruction::Freeze, let it be UnaryOperator - Add support for freeze to LLLexer/LLParser/BitcodeReader/BitcodeWriter The format is `%x = freeze <ty> %v` - Add support for freeze instruction to llvm-c interface. - Add m_Freeze in PatternMatch. - Erase freeze when lowering IR to SelDag. Reviewers: deadalnix, hfinkel, efriedma, lebedev.ri, nlopes, jdoerfert, regehr, filcab, delcypher, whitequark Reviewed By: lebedev.ri, jdoerfert Subscribers: jfb, kristof.beyls, hiraditya, lebedev.ri, steven_wu, dexonsmith, xbolva00, delcypher, spatel, regehr, trentxintong, vsk, filcab, nlopes, mehdi_amini, deadalnix, llvm-commits Differential Revision: https://reviews.llvm.org/D29011
*	[DAGCombine][MSP430] use shift amount threshold in DAGCombine (2/2)	Sanjay Patel	2019-11-04	1	-36/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Continuation of: D69116 Contributes to a fix for PR43559: https://bugs.llvm.org/show_bug.cgi?id=43559 See also D69099 and D69116 Use the TLI hook in DAGCombine.cpp to guard against creating shift nodes that are not optimal for a target. Patch by: @joanlluch (Joan LLuch) Differential Revision: https://reviews.llvm.org/D69120
*	[FPEnv][SelectionDAG] Refactor strict FP node construction	Ulrich Weigand	2019-11-04	1	-23/+23
\| \| \| \| \| \| \| \| \| \| \|	Small refactoring in visitConstrainedFPIntrinsic that should make it easier to create DAG nodes requiring extra arguments. That is the case currently only for STRICT_FP_ROUND, but may be the case for additional nodes (in particular compares) in the future. Extracted from the patch for D69281. NFC.
*	[MachineVerifier] Improve verification of live-in lists.	Jonas Paulsson	2019-11-04	1	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	MachineVerifier::visitMachineFunctionAfter() is extended to check the live-through case for live-in lists. This is only done for registers without aliases and that are neither allocatable or reserved, such as the SystemZ::CC register. The MachineVerifier earlier only catched the case of a live-in use without an entry in the live-in list (as "using an undefined physical register"). A comment in LivePhysRegs.h has been added stating a guarantee that addLiveOuts() can be trusted for a full register both before and after register allocation. Review: Quentin Colombet https://reviews.llvm.org/D68267
*	[SelectionDAG] Fixed null check after dereferencing warning. NFCI.	Dávid Bolvanský	2019-11-03	1	-1/+1
\|
*	[CodeGen] [ExpandReduction] Fix the bug for ExpandReduction() when vector ↵	shkzhang	2019-11-02	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	size isn't power of 2 Summary: For below test case, we will get assert error except for AArch64 and ARM: declare i8 @llvm.experimental.vector.reduce.and.i8.v3i8(<3 x i8> %a) define i8 @test_v3i8(<3 x i8> %a) nounwind { %b = call i8 @llvm.experimental.vector.reduce.and.i8.v3i8(<3 x i8> %a) ret i8 %b } In the function getShuffleReduction (), we can see it needs the vector size must be power of 2. This patch is fix below error when the number of element is not power of 2 for those llvm.experimental.vector.reduce.* function. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D68625
*	FastISel - fix uninitialized variable warnings in constructor. NFCI.	Simon Pilgrim	2019-11-02	1	-1/+2
\|
*	Fix uninitialized variable warning. NFCI.	Simon Pilgrim	2019-11-02	1	-1/+1
\|
*	DebugInfo: Streamline debug_ranges/rnglists/rnglists.dwo emission code	David Blaikie	2019-11-01	2	-47/+23
\| \| \| \| \|	More code reuse, better basis for modelling debug_loc/loclists/loclists.dwo emission support.
*	[TargetLowering] Move the setBooleanContents check on (xor (setcc), (setcc)) ↵	Craig Topper	2019-11-01	1	-8/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	== / != 1 -> (setcc) != / == (setcc) to the right place We need to be checking the value types for the inner setccs not the outer setcc. We need to ensure those setccs produce a 0/1 value or that the xor is on the i1 type. I think at the time this code was originally written, getBooleanContents didn't take any arguments so this was probably correct. But now we can have a different boolean contents for integer and floating point. Not sure why the other combines below the xor were also checking the boolean contents. None of them involve any setccs other than the outer one and they only produce a new setcc. Differential Revision: https://reviews.llvm.org/D69480
*	[MachineBasicBlock] Skip over debug instructions in computeRegisterLiveness ↵	Craig Topper	2019-11-01	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \|	before checking for begin If there are debug instructions before the stopping point, we need to skip over them before checking for begin in order to avoid having the debug instructions effect behavior. Fixes PR43758. Differential Revision: https://reviews.llvm.org/D69606
*	[WinCFG] Handle constant casts carefully in .gfids emission	Reid Kleckner	2019-11-01	1	-1/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The general Function::hasAddressTaken has two issues that make it inappropriate for our purposes: 1. it is sensitive to dead constant users (PR43858 / crbug.com/1019970), leading to different codegen when debu info is enabled 2. it considers direct calls via a function cast to be address escapes The first is fixable, but the second is not, because IPO clients rely on this behavior. They assume this function means that all call sites are analyzable for IPO purposes. So, implement our own analysis, which gets closer to finding functions that may be indirect call targets. Reviewers: ajpaverd, efriedma, hans Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69676
*	[LDV][RAGreedy] Inform LiveDebugVariables about new VRegs added by InlineSpiller	Bjorn Pettersson	2019-11-01	2	-17/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Make sure RAGreedy informs LiveDebugVariables about new VRegs that is introduced at spill by InlineSpiller. Consider this example LDV: !"var" [48r;128r):0 Loc0=%2 48B %2 = ... ... 128B %7 = ADD %2, ... If %2 is spilled the InlineSpiller will insert spill/reload instructions and introduces some new vregs. So we get 48B %4 = ... 56B spill %4 ... 120B reload %5 128B %3 = ADD %5, ... In the past we did not inform LDV about this, and when reintroducing DBG_VALUE instruction LDV still got information that "var" had the location of the spilled register %2 for the interval [48r;128r). The result was bad, since we mapped "var" to the spill slot even before the spill happened: %4 = ... DBG_VALUE %spill.0, !"var" spill %4 to %spill.0 ... reload %5 %3 = ADD %5, ... This patch will inform LDV about the interval split introduced due to spilling. So the location map in LDV will become !"var" [48r;56r):1 [56r;120r):0 [120r;128r):2 Loc0=%2 Loc1=%4 Loc2=%5 And when inserting DBG_VALUE instructions we get %4 = ... DBG_VALUE %4, !"var" spill %4 to %spill.0 DBG_VALUE %spill.0, !"var" ... reload %5 DBG_VALUE %5, !"var" %3 = ADD %5, ... Fixes: https://bugs.llvm.org/show_bug.cgi?id=38899 Reviewers: jmorse, vsk, aprantl Reviewed By: jmorse Subscribers: dstenb, wuzish, MatzeB, qcolombet, nemanjai, hiraditya, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69584
*	DAG: Add DAG argument to isFPExtFoldable	Matt Arsenault	2019-10-31	1	-14/+28
\| \| \| \| \|	For AMDGPU this is dependent on the FP mode, which should eventually not be a property of the subtarget.
*	[PGO][PGSO] Fix -DBUILD_SHARED_LIBS=on builds after ↵	Fangrui Song	2019-10-31	1	-0/+22
\| \| \| \| \| \| \| \| \| \| \|	D69580/llvmorg-10-init-8797-g0d987e411ac Move TargetLoweringBase::isSuitableForJumpTable from llvm/CodeGen/TargetLowering.h to .cpp, to avoid the undefined reference from all LLVM${Target}ISelLowering.cpp. Another fix is to add a dependency on TransformUtils to all lib/Target/$Target/LLVMBuild.txt, but that is too disruptive.
*	[PGO][PGSO] TargetLowering/TargetTransformationInfo/SwitchLoweringUtils part.	Hiroshi Yamauchi	2019-10-31	3	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: (Split of off D67120) TargetLowering/TargetTransformationInfo/SwitchLoweringUtils changes for profile guided size optimization. Reviewers: davidxl Subscribers: eraman, hiraditya, haicheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69580
*	Revert rG57ee0435bd47f23f3939f402914c231b4f65ca5e - [TII] Use optional ↵	Simon Pilgrim	2019-10-31	2	-13/+10
\| \| \| \| \| \|	destination and source pair as a return value; NFC This is breaking MSVC builds: http://lab.llvm.org:8011/builders/llvm-clang-x86_64-expensive-checks-win/builds/20375
*	Fix missing memcpy, memmove and memset tail calls	Sanne Wouda	2019-10-31	1	-1/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If a wrapper around one of the mem* stdlib functions bitcasts the returned pointer value before returning it (e.g. to a wchar_t*), LLVM does not emit a tail call. Add a check for this scenario so that we emit a tail call. Reviewers: wmi, mkuper, ramred01, dmgreen Reviewed By: wmi, dmgreen Subscribers: hiraditya, sanwou01, javed.absar, lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D59078
*	DAG: Add new control for ISD::FMAD formation	Matt Arsenault	2019-10-31	1	-2/+2
\| \| \| \| \| \| \| \| \|	For AMDGPU this depends on whether denormals are enabled in the default FP mode for the function. Currently this is treated as a subtarget feature, so FMAD is selectively legal based on that. I want to move this out of the subtarget features so this can be controlled with a denormal mode attribute. Additionally, this will allow folding based on a future ftz fast math flag.