summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
...
* [LegalizeTypes] Remove SoftenFloat handling from ExpandIntRes_LLROUND_LLRINT ↵Craig Topper2019-11-171-9/+0
| | | | | | | | | and remove assert from the strict fp path. These were both recently added. While the call to GetSoftenedFloat is a little more optimal, we don't do it in the expand for FP_TO_SINT/UINT so there's no real reason to do it here. This avoids a FIXME for strict fp.
* [LegalizeTypes] Remove unnecessary conversion from EVT to MVT to ↵Craig Topper2019-11-171-8/+8
| | | | MVT::SimpleValueType just to assign back to EVT. NFC
* [LegalizeTypes][X86] Add support for expanding the result type of ↵Craig Topper2019-11-171-3/+18
| | | | | | | | | STRICT_LLROUND and STRICT_LLRINT. This doesn't handle softening the input type, but we don't handle softening any of the strict nodes yet. Skipping that made it easy to reuse an existing function for creating a libcall from a node with a chain.
* [LegalizeTypes] When expanding the integer result of LLROUND/LLRINT, also ↵Craig Topper2019-11-171-0/+6
| | | | | | | | | call GetSoftenedFloat if the floating point input needs to be softened. Before this we were emitting a bitcast to integer from the lowering code that itself will need to be legalized. By calling GetSoftenedFloat we get the integer conversion in one step without needing to relegalize a bitcast.
* [LegalizeTypes] Remove PromoteFloat support form ExpandIntRes_LLROUND_LLRINT.Craig Topper2019-11-171-5/+6
| | | | | | | | | | | This code isn't exercised, and was in the wrong place. If we need this, we would need to promote the type before figuring out which libcall to use. I'm choosing to remove it rather than fixing since we don't support PromoteFloat for LRINT/LROUND/LLRINT/LLROUND when the result type is legal so I don't see much reason to support it for the case where the result type isn't legal.
* [LegalizeTypes] Merge ExpandIntRes_LLROUND and ExpandIntRes_LLRINT into one ↵Craig Topper2019-11-172-44/+31
| | | | | | | | | function that handles both. NFC These too functions are were the same except for which libcall gets emitted. Just merge them into one. This is prep work for some other work including strict fp support.
* [DWARF5]Addition of alignment atrribute in typedef DIE.Sourabh Singh Tomar2019-11-161-0/+9
| | | | | | | | | | | | This patch, adds support for DW_AT_alignment[DWARF5] attribute, to be emitted with typdef DIE. When explicit alignment is specified. Patch by Awanish Pandey <Awanish.Pandey@amd.com> Reviewers: aprantl, dblaikie, jini.susan.george, SouraVX, alok, deadalinx Differential Revision: https://reviews.llvm.org/D70111
* DebugInfo: Use loclistx for DWARFv5 location lists to reduce the number of ↵David Blaikie2019-11-153-3/+15
| | | | | | | | | | relocations This only implements the non-dwo part, but loclistx is necessary to use location lists in DWARFv5, so it's a precursor to that work - and generally reduces relocations (only using one reloc, then indexes/relative offsets for all location list references) in non-split DWARF.
* [GISel][CombinerHelper] Use uses() instead of operands() when traversing use ↵Quentin Colombet2019-11-151-9/+2
| | | | | | operands. NFC
* [GISel][CombinerHelper] Add support for scalar type for the result of ↵Quentin Colombet2019-11-151-3/+17
| | | | | | | | | | | | | | | | | | shuffle vector LLVM IR of 1-element vectors get lower into scalar in GISel. As a result, shuffle vector may also produce a scalar. This patch teaches the shuffle combiner how to deal with scalars when they are in the destination type of a shuffle vector. For now, we just support the easy case where this can be lowered to a plain copy. For other cases, we leave the shuffle vector as is. This type of IR are seen in O0 pipelines. E.g., as produced with SingleSource/UnitTests/Vector/AArch64/aarch64_neon_intrinsics.c. rdar://problem/57198904
* [MirNamer][Canonicalizer]: Perform instruction semantic based renamingAditya Nandakumar2019-11-154-369/+143
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://reviews.llvm.org/D70210 Previously: Due to sensitivity of the algorithm with gaps, and extra instructions, when diffing, often we see naming being off by a few. Makes the diff unreadable even for tests with 7 and 8 instructions respectively. Naming can change depending on candidates (and order of picking candidates). Suddenly if there's one extra instruction somewhere, the entire subtree would be named completely differently. No consistent naming of similar instructions which occur in different functions. If we try to do something like count the frequency distribution of various differences across suite, then the above sensitivity issues are going to result in poor results. Instead: Name instruction based on semantics of the instruction (hash of the opcode and operands). Essentially for a given instruction that occurs in any module/function it'll be named similarly (ie semantic). This has some nice properties Can easily look at many instructions and just check the hash and if they're named similarly, then it's the same instruction. Makes it very easy to spot the same instruction both multiple times, as well as across many functions (useful for frequency distribution). Independent of traversal/candidates/depth of graph. No need to keep track of last index/gaps/skip count etc. No off by few issues with diffs. I've tried the old vs new implementation in files ranging from 30 to 700 instructions. In both cases with the old algorithm, diffs are a sea of red, where as for the semantic version, in both cases, the diffs line up beautifully. Simplified implementation of the main loop (simple iteration) , no keep track of what's visited and not. Handle collision just by incrementing a counter. Roughly bb[N]_hash_[CollisionCount]. Additionally with the new implementation, we can probably avoid doing the hoisting of instructions to various places, as they'll likely be named the same resulting in differences only based on collision (ie regardless of whether the instruction is hoisted or not/close to use or not, it'll be named the same hash which should result in use of the instruction be identical with the only change being the collision count) which is very easy to spot visually.
* Add read-only data assembly writing for aixdiggerlin2019-11-151-0/+4
| | | | | | | | | | SUMMARY: The patch will emit read-only variable assembly code for aix. Reviewers: daltenty,Xiangling_Liao Subscribers: rupprecht, seiyai,hiraditya Differential Revision: https://reviews.llvm.org/D70182
* Move floating point related entities to namespace levelSerge Pavlov2019-11-151-2/+1
| | | | | | | | | | | | | | Enumerations that describe rounding mode and exception behavior were defined inside ConstrainedFPIntrinsic. It makes sense to use the same definitions to represent the same properties in other cases, not only in constrained intrinsics. It was however inconvenient as required to include constrained intrinsics definitions even if they were not needed. Also using long scope prefix reduced readability. This change moves these definitioins to the namespace llvm::fp. No functional changes. Differential Revision: https://reviews.llvm.org/D69552
* [CodeGen] Increase the size of a SmallVectorJay Foad2019-11-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SmallVector reserve() call in MachineInstrExpressionTrait::getHashValue accounted for over 3% of all calls to malloc() when I compiled a bunch of graphics shaders for the AMDGPU target. Its initial size was only enough for machine instructions with up to 7 operands, but for AMDGPU 8 and 10 operands are very common. Here's a histogram of number of operands for each call to getHashValue, gathered from the same collection of shaders: 1 13503 2 254273 3 135781 4 422508 5 614997 6 194953 7 287248 8 1517255 9 31218 10 1191269 11 70731 12 24 13 77 15 84 17 4692 27 16 33 705 49 6 Typical instructions with 8 and 10 operands are floating point arithmetic and multiply-accumulate instructions like: %83:vgpr_32 = V_MUL_F32_e64 0, killed %82:vgpr_32, 0, killed %81:vgpr_32, 0, 0, implicit $exec %330:vgpr_32 = V_MAC_F32_e64 0, killed %327:vgpr_32, 0, killed %329:sgpr_32, 0, %328:vgpr_32(tied-def 0), 0, 0, implicit $exec Differential Revision: https://reviews.llvm.org/D70301
* GlobalISel: Lower s1 source G_SITOFP/G_UITOFPMatt Arsenault2019-11-151-0/+16
|
* Add missing includes needed to prune LLVMContext.h include, NFCReid Kleckner2019-11-143-0/+3
| | | | | These are a pre-requisite to removing #include "llvm/Support/Options.h" from LLVMContext.h: https://reviews.llvm.org/D70280
* [DebugInfo] Allow spill slots in call site parameter descriptionsVedant Kumar2019-11-142-5/+23
| | | | | | | | | | | | | | | | | | | | | | Allow call site paramter descriptions to reference spill slots. Spill slots are not visible to high-level LLVM IR, so they can safely be referenced during entry value evaluation (as they cannot be clobbered by some other function). This gives a 5% increase in the number of call site parameter DIEs in an LTO x86_64 build of the xnu kernel. This reverts commit eb4c98ca3d2590bad9f6542afbf3a7824d2b53fa ( [DebugInfo] Exclude memory location values as parameter entry values), effectively reintroducing the portion of D60716 which dealt with memory locations (authored by Djordje, Nikola, Ananth, and Ivan). This partially addresses llvm.org/PR43343. However, not all memory operands forwarded to callees live in spill slots. In the xnu build, it may be possible to use an escape analysis to increase the number of call site parameter by another 15% (more details in PR43343). Differential Revision: https://reviews.llvm.org/D70254
* [globalisel][irtanslator] The IRTranslator should preserve TBAA informationDaniel Sanders2019-11-141-5/+15
|
* [Pipeliner] Fix an assertion caused by iterator invalidation.Sumanth Gundapaneni2019-11-141-2/+3
|
* [ExpandReductions] Don't push all intrinsics to the worklist. Just push ↵Craig Topper2019-11-141-10/+29
| | | | | | | | | | | | | | | | | reductions. We were previously pushing all intrinsics used in a function to the worklist. This is wasteful for memory in a function with a lot of intrinsics. We also ask TTI if we should expand every intrinsic, but we only have expansion support for the reduction intrinsics. This just wastes time for the non-reduction intrinsics. This patch only pushes reduction intrinsics into the worklist and skips other intrinsics. Differential Revision: https://reviews.llvm.org/D69470
* Replace wrongly deleted header banner, fix formattingReid Kleckner2019-11-142-11/+20
| | | | | | | | I reviewed the diff hunks of 05da2fe52162c80dfa that don't contain '#include' lines, and found two unintended changes. I deleted a header banner inadvertently while inserting a header, and changed the indentation of a constructor in an odd way. Add back the banner, and reformat the constructor.
* [DAGCombiner] Drop redundant DAG method param. NFCPaweł Bylica2019-11-141-4/+3
|
* [DAGCombiner] Use TLI field already available. NFCPaweł Bylica2019-11-141-3/+0
|
* Move CodeGenFileType enum to Support/CodeGen.hReid Kleckner2019-11-131-2/+2
| | | | | | | Avoids the need to include TargetMachine.h from various places just for an enum. Various other enums live here, such as the optimization level, TLS model, etc. Data suggests that this change probably doesn't matter, but it seems nice to have anyway.
* Sink all InitializePasses.h includesReid Kleckner2019-11-1394-36/+142
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of recompilation. I found this fact by looking at this table, which is sorted by the number of times a file was changed over the last 100,000 git commits multiplied by the number of object files that depend on it in the current checkout: recompiles touches affected_files header 342380 95 3604 llvm/include/llvm/ADT/STLExtras.h 314730 234 1345 llvm/include/llvm/InitializePasses.h 307036 118 2602 llvm/include/llvm/ADT/APInt.h 213049 59 3611 llvm/include/llvm/Support/MathExtras.h 170422 47 3626 llvm/include/llvm/Support/Compiler.h 162225 45 3605 llvm/include/llvm/ADT/Optional.h 158319 63 2513 llvm/include/llvm/ADT/Triple.h 140322 39 3598 llvm/include/llvm/ADT/StringRef.h 137647 59 2333 llvm/include/llvm/Support/Error.h 131619 73 1803 llvm/include/llvm/Support/FileSystem.h Before this change, touching InitializePasses.h would cause 1345 files to recompile. After this change, touching it only causes 550 compiles in an incremental rebuild. Reviewers: bkramer, asbirlea, bollu, jdoerfert Differential Revision: https://reviews.llvm.org/D70211
* Sink MachineFunction private method out of lineReid Kleckner2019-11-131-0/+9
| | | | | | | This method is private and only called from this file and doesn't need to be inline. Saves a TargetMachine.h include in MachineFunction.h, a popular header. The include was introduced in 98603a8153086 despite the forward decl of LLVMTargetMachine.
* [TargetLowering] Increase the storage size of NumRegistersForVT to allow the ↵Craig Topper2019-11-131-1/+4
| | | | | | | | | | type break down for v256i1 and other types to be stored correctly v256i1 on X86 without avx512 breaks down to 256 i8 values when passed between basic blocks. But the NumRegistersForVT was sized at a byte for each VT. This results in 256 being stored as 0. This patch enlarges the type to 16 bits and adds an assert to ensure that no information is lost when the entry is stored. Differential Revision: https://reviews.llvm.org/D70138
* [LiveInterval] Allow updating subranges with slightly out-dated IRQuentin Colombet2019-11-132-10/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During register coalescing, we update the live-intervals on-the-fly. To do that we are in this strange mode where the live-intervals can be slightly out-of-sync (more precisely they are forward looking) compared to what the IR actually represents. This happens because the register coalescer only updates the IR when it is done with updating the live-intervals and it has to do it this way because updating the IR on-the-fly would actually clobber some information on how the live-ranges that are being updated look like. This is problematic for updates that rely on the IR to accurately represents the state of the live-ranges. Right now, we have only one of those: stripValuesNotDefiningMask. To reconcile this need of out-of-sync IR, this patch introduces a new argument to LiveInterval::refineSubRanges that allows the code doing the live range updates to reason about how the code should look like after the coalescer will have rewritten the registers. Essentially this captures how a subregister index with be offseted to match its position in a new register class. E.g., let say we want to merge: V1.sub1:<2 x s32> = COPY V2.sub3:<4 x s32> We do that by choosing a class where sub1:<2 x s32> and sub3:<4 x s32> overlap, i.e., by choosing a class where we can find "offset + 1 == 3". Put differently we align V2's sub3 with V1's sub1: V2: sub0 sub1 sub2 sub3 V1: <offset> sub0 sub1 This offset will look like a composed subregidx in the the class: V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32> => V1.(composed sub2 with sub1):<4 x s32> = COPY V2.sub3:<4 x s32> Now if we didn't rewrite the uses and def of V1, all the checks for V1 need to account for this offset to match what the live intervals intend to capture. Prior to this patch, we would fail to recognize the uses and def of V1 and would end up with machine verifier errors: No live segment at def. This could lead to miscompile as we would drop some live-ranges and thus, miss some interferences. For this problem to trigger, we need to reach stripValuesNotDefiningMask while having a mismatch between the IR and the live-ranges (i.e., we have to apply a subreg offset to the IR.) This requires the following three conditions: 1. An update of overlapping subreg lanes: e.g., dsub0 == <ssub0, ssub1> 2. An update with Tuple registers with a possibility to coalesce the subreg index: e.g., v1.dsub_1 == v2.dsub_3 3. Subreg liveness enabled. looking at the IR to decide what is alive and what is not, i.e., calling stripValuesNotDefiningMask. coalescer maintains for the live-ranges information. None of the targets that currently use subreg liveness (i.e., the targets that fulfill #3, Hexagon, AMDGPU, PowerPC, and SystemZ IIRC) expose #1 and and #2, so this patch also artificial enables subreg liveness for ARM, so that a nice test case can be attached.
* Fix typo in DwarfDebug [NFC]David Stenberg2019-11-131-1/+1
|
* [DebugInfo] Avoid creating entry values for clobbered registersDavid Stenberg2019-11-131-4/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Entry values are considered for parameters that have register-described DBG_VALUEs in the entry block (along with other conditions). If a parameter's value has been propagated from the caller to the callee, then the parameter's DBG_VALUE in the entry block may be described using a register defined by some instruction, and entry values should not be emitted for the parameter, which can currently occur. One such case was seen in the attached test case, in which the second parameter, which is described by a redefinition of the first parameter's register, would incorrectly get an entry value using the first parameter's register. This commit intends to solve such cases by keeping track of register defines, and ignoring DBG_VALUEs in the entry block that are described by such registers. In a RelWithDebInfo build of clang-8, the average size of the set was 27, and in a RelWithDebInfo+ASan build it was 30. Reviewers: djtodoro, NikolaPrica, aprantl, vsk Reviewed By: djtodoro, vsk Subscribers: hiraditya, llvm-commits Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D69889
* [DebugInfo] Add helper for finding entry value candidates [NFC]David Stenberg2019-11-131-28/+61
| | | | | | | | | | | | | | | | | | Summary: The conditions that are used to determine if entry values should be emitted for a parameter are quite many, and will grow slightly in a follow-up commit, so move those to a helper function, as was suggested in the code review for D69889. Reviewers: djtodoro, NikolaPrica Reviewed By: djtodoro Subscribers: probinson, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69955
* [AArch64][SVE] Allocate locals that are scalable vectors.Sander de Smalen2019-11-131-1/+8
| | | | | | | | | | | | This patch adds a target interface to set the StackID for a given type, which allows scalable vectors (e.g. `<vscale x 16 x i8>`) to be assigned a 'sve-vec' StackID, so it is allocated in the SVE area of the stack frame. Reviewers: ostannard, efriedma, rengolin, cameron.mcinally Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D70080
* [TargetLowering][DAGCombine][MSP430] Shift Amount Threshold in DAGCombine (4)joanlluch2019-11-132-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Replaces ``` unsigned getShiftAmountThreshold(EVT VT) ``` by ``` bool shouldAvoidTransformToShift(EVT VT, unsigned amount) ``` thus giving more flexibility for targets to decide whether particular shift amounts must be considered expensive or not. Updates the MSP430 target with a custom implementation. This continues D69116, D69120, D69326 and updates them, so all of them must be committed before this. Existing tests apply, a few more have been added. Reviewers: asl, spatel Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70042
* MCP: Fixed bug with dest overlapping copy sourceTim Renouf2019-11-121-0/+9
| | | | | | | | | | | | In MachineCopyPropagation, when propagating the source of a copy into the operand of a later instruction, bail if a destination overlaps (partly defines) the copy source. If the instruction where the substitution is happening is also a copy, allowing the propagation confuses the tracking mechanism. Differential Revision: https://reviews.llvm.org/D69953 Change-Id: Ic570754f878f2d91a4a50a9bdcf96fbaa240726d
* [IR] Redefine Freeze instructionaqjune2019-11-122-2/+2
| | | | | | | | | | | | | | | | | | | | Summary: This patch redefines freeze instruction from being UnaryOperator to a subclass of UnaryInstruction. ConstantExpr freeze is removed, as discussed in the previous review. FreezeOperator is not added because there's no ConstantExpr freeze. `freeze i8* null` test is added to `test/Bindings/llvm-c/freeze.ll` as well, because the null pointer-related bug in `tools/llvm-c/echo.cpp` is now fixed. InstVisitor has visitFreeze now because freeze is not unaryop anymore. Reviewers: whitequark, deadalnix, craig.topper, jdoerfert, lebedev.ri Reviewed By: craig.topper, lebedev.ri Subscribers: regehr, nlopes, mehdi_amini, hiraditya, steven_wu, dexonsmith, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69932
* [PowerPC][XCOFF] Add support for zero initialized global values.Sean Fertile2019-11-111-0/+6
| | | | | | | | For XCOFF, globals mapped into the .bss section are linked as COMMON definitions. This behaviour is incorrect for zero initialized data, so emit those to the .data section instead. Differential Revision: https://reviews.llvm.org/D69528
* Disable hoisting MI to hotter basic blocksVictor Huang2019-11-111-0/+62
| | | | | | | | | In current Hoist() function of machine licm pass, it will not check the source and destination basic block frequencies that a instruction is hoisted from/to. There is a chance that instruction is hoisted from a cold to a hot basic block. In this patch, we add options to disable machine instruction hoisting if destination block is hotter. Differential Revision: https://reviews.llvm.org/D63676
* [ModuloSchedule] Fix modulo expansion for data loop carried dependencies.Thomas Raoux2019-11-111-18/+132
| | | | | | | | | | | | | The new experimental expansion has a problem when a value has a data dependency with an instruction from a previous stage. This is due to the way we peel out the kernel. To fix that I'm changing the way we peel out the kernel. We now peel the kernel NumberStage - 1 times. The code would be correct at this point if we didn't have to handle cases where the loop iteration is smaller than the number of stages. To handle this case we move instructions between different epilogues based on their stage and remap the PHI instructions correctly. Differential Revision: https://reviews.llvm.org/D69538
* [ModuloSchedule] Do target loop analysis before peeling.Thomas Raoux2019-11-111-4/+2
| | | | | | | | | Simple change to call target hook analyzeLoopForPipelining before changing the loop. After peeling analyzing the loop may be more complicated for target that don't have a loop instruction. This doesn't affect Hexagone and PPC as they have hardware loop instructions. Differential Revision: https://reviews.llvm.org/D69912
* [CGP] Make ICMP_EQ use CR result of ICMP_S(L|G)T dominatorsYi-Hong Lyu2019-11-111-0/+94
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For example: long long test(long long a, long long b) { if (a << b > 0) return b; if (a << b < 0) return a; return a*b; } Produces: sld. 5, 3, 4 ble 0, .LBB0_2 mr 3, 4 blr .LBB0_2: # %if.end cmpldi 5, 0 li 5, 1 isel 4, 4, 5, 2 mulld 3, 4, 3 blr But the compare (cmpldi 5, 0) is redundant and can be removed (CR0 already contains the result of that comparison). The root cause of this is that LLVM converts signed comparisons into equality comparison based on dominance. Equality comparisons are unsigned by default, so we get either a record-form or cmp (without the l for logical) feeding a cmpl. That is the situation we want to avoid here. Differential Revision: https://reviews.llvm.org/D60506
* [ObjC] Override TailCallKind when lowering objc intrinsicsFrancis Visoiu Mistrih2019-11-111-1/+25
| | | | | | | | | | | The tail-call-kind-ness is known by the ObjCARC analysis and can be enforced while lowering the intrinsics to calls. This allows us to get the requested tail calls at -O0 without trying to preserve the attributes throughout passes that change code even at -O0 ,like the Always Inliner, where the ObjCOpt pass doesn't run. Differential Revision: https://reviews.llvm.org/D69980
* [TargetLowering][DAGCombine][MSP430] Shift Amount Threshold in DAGCombine (3)joanlluch2019-11-111-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Additional filtering of undesired shifts for targets that do not support them efficiently. Related with D69116 and D69120 Applies the TLI.getShiftAmountThreshold hook to prevent undesired generation of shifts for the following IR code: ``` define i16 @testShiftBits(i16 %a) { entry: %and = and i16 %a, -64 %cmp = icmp eq i16 %and, 64 %conv = zext i1 %cmp to i16 ret i16 %conv } define i16 @testShiftBits_11(i16 %a) { entry: %cmp = icmp ugt i16 %a, 63 %conv = zext i1 %cmp to i16 ret i16 %conv } define i16 @testShiftBits_12(i16 %a) { entry: %cmp = icmp ult i16 %a, 64 %conv = zext i1 %cmp to i16 ret i16 %conv } ``` The attached diff file shows the piece code in TargetLowering that is responsible for the generation of shifts in relation to the IR above. Before applying this patch, shifts will be generated to replace non-legal icmp immediates. However, shifts may be undesired if they are even more expensive for the target. For all my previous patches in this series (cited above) I added test cases for the MSP430 target. However, in this case, the target is not suitable for showing improvements related with this patch, because the MSP430 does not implement "isLegalICmpImmediate". The default implementation returns always true, therefore the patched code in TargetLowering is never reached for that target. Targets implementing both "isLegalICmpImmediate" and "getShiftAmountThreshold" will benefit from this. The differential effect of this patch can only be shown for the MSP430 by temporarily implementing "isLegalICmpImmediate" to return false for large immediates. This is simulated with the implementation of a command line flag that was incorporated in D69975 This patch belongs to a initiative to "relax" the generation of shifts by LLVM for targets requiring it Reviewers: spatel, lebedev.ri, asl Reviewed By: spatel Subscribers: lenary, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69326
* RegisterCoalescer - remove duplicate variable to fix Wshadow warning. NFCI.Simon Pilgrim2019-11-091-3/+2
|
* RegisterCoalescer - fix uninitialized variables. NFCI.Simon Pilgrim2019-11-091-10/+10
|
* Fix operator precedence warning. NFC.Simon Pilgrim2019-11-091-1/+1
|
* DebugInfo: Remove redundant conditionals/checks from macro info emissionDavid Blaikie2019-11-081-18/+8
| | | | | These checks fall out naturally from the current implementation without needing to be explicitly considered anymore.
* DebugInfo: Do not create a debug_macinfo section if no CUs have associated ↵David Blaikie2019-11-081-4/+2
| | | | | | macros Patch based on Sourabh Singh's D69839 patch.
* NVPTX: Don't insert an extra empty line at the end of the last section.David Blaikie2019-11-081-2/+0
| | | | | | | This was arbitrarily appearing in only the last section emitted - which made tests more sensitive than they needed to be (removing the last section - like the macinfo section change that's coming after this) would, surprisingly, move the blank line to the previous section.
* DebugInfo: Use separate macinfo contributions for each CUDavid Blaikie2019-11-081-2/+2
| | | | | | | | | | | | | | | | The macinfo support was broken for LTO situations, by terminating macinfo lists only once - multiple macinfo contributions were correctly labeled, but they all continued/flowed into later contributions until only one terminator appeared at the end of the section. Correctly terminate each contribution & fix the parsing to handle this situation too. The parsing fix is also necessary for dumping linked binaries - the previous code would stop at the end of the first contribution - missing all later contributions in a linked binary. It'd be nice to improve the dumping to print the offsets of each contribution so it'd be easier to know which CU AT_macro_info refers to which macinfo contribution.
* [AArch64][X86] Don't assume __powidf2 is available on Windows.Eli Friedman2019-11-082-8/+35
| | | | | | | | | | We had some code for this for 32-bit ARM, but this doesn't really need to be in target-specific code; generalize it. (I think this started showing up recently because we added an optimization that converts pow to powi.) Differential Revision: https://reviews.llvm.org/D69013
OpenPOWER on IntegriCloud