summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [FPEnv][X86] More strict int <-> FP conversion fixesUlrich Weigand2019-12-236-117/+145
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix several several additional problems with the int <-> FP conversion logic both in common code and in the X86 target. In particular: - The STRICT_FP_TO_UINT expansion emits a floating-point compare. This compare can raise exceptions and therefore needs to be a strict compare. I've made it signaling (even though quiet would also be correct) as signaling is the more usual default for an LT. This code exists both in common code and in the X86 target. - The STRICT_UINT_TO_FP expansion algorithm was incorrect for strict mode: it emitted two STRICT_SINT_TO_FP nodes and then used a select to choose one of the results. This can cause spurious exceptions by the STRICT_SINT_TO_FP that ends up not chosen. I've fixed the algorithm to use only a single STRICT_SINT_TO_FP instead. - The !isStrictFPEnabled logic in DoInstructionSelection would sometimes do the wrong thing because it calls getOperationAction using the result VT. But for some opcodes, incuding [SU]INT_TO_FP, getOperationAction needs to be called using the operand VT. - Remove some (obsolete) code in X86DAGToDAGISel::Select that would mutate STRICT_FP_TO_[SU]INT to non-strict versions unnecessarily. Reviewed by: craig.topper Differential Revision: https://reviews.llvm.org/D71840
* Fix LLVM tool --version build mode printing for MSVCReid Kleckner2019-12-231-1/+23
| | | | | | | | | | | | | | | LLVM tools such as llc print "DEBUG build" or "Optimized build" when passed --version. Before this change, this was implemented by checking for the __OPTIMIZE__ GCC macro. MSVC does not define this macro. For MSVC, control this behavior with _DEBUG instead. It doesn't have precisely the same meaning, but in most configurations, it will do the right thing. Fixes PR17752 Reviewed by: MaskRay Differential Revision: https://reviews.llvm.org/D71817
* [AMDGPU] Don't create MachinePointerInfos with an UndefValue pointerJay Foad2019-12-235-34/+11
| | | | | | | | | | | | | | | Summary: The only useful information the UndefValue conveys is the address space, which MachinePointerInfo can represent directly without referring to an IR value. Reviewers: arsenm, rampitec Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71838
* [DAGCombine] visitEXTRACT_SUBVECTOR - 'little to big' ↵Sanjay Patel2019-12-233-98/+17
| | | | | | | | | | | | | | | | | | | extract_subvector(bitcast()) support This moves the X86 specific transform from rL364407 into DAGCombiner to generically handle 'little to big' cases (for example: extract_subvector(v2i64 bitcast(v16i8))). This allows us to remove both the x86 implementation and the aarch64 bitcast(extract_subvector(bitcast())) combine. Earlier patches that dealt with regressions initially exposed by this patch: rG5e5e99c041e4 rG0b38af89e2c0 Patch by: @RKSimon (Simon Pilgrim) Differential Revision: https://reviews.llvm.org/D63815
* [Matrix] Use fmuladd for matrix.multiply if allowed.Florian Hahn2019-12-231-5/+25
| | | | | | | | | | | | | If the matrix.multiply calls have the contract fast math flag, we can use fmuladd. This als adds a command line option to force fmuladd generation. We can retire this option once there is a clang-level option. Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D70951
* [Matrix] Add forward shape propagation and first shape aware lowerings.Florian Hahn2019-12-231-52/+278
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds infrastructure for forward shape propagation to LowerMatrixIntrinsics. It also updates the pass to make use of the shape information to break up larger vector operations and to eliminate unnecessary conversion operations between columnwise matrixes and flattened vectors: if shape information is available for an instruction, lower the operation to a set of instructions operating on columns. For example, a store of a matrix is broken down into separate stores for each column. For users that do not have shape information (e.g. because they do not yet support shape information aware lowering), we pack the result columns into a flat vector and update those users. It also adds shape aware lowering for the first non-intrinsic instruction: vector stores. Example: For %c = call <4 x double> @llvm.matrix.transpose(<4 x double> %a, i32 2, i32 2) store <4 x double> %c, <4 x double>* %Ptr We generate the code below without shape propagation. Note %9 which combines the columns of the transposed matrix into a flat vector. %split = shufflevector <4 x double> %a, <4 x double> undef, <2 x i32> <i32 0, i32 1> %split1 = shufflevector <4 x double> %a, <4 x double> undef, <2 x i32> <i32 2, i32 3> %1 = extractelement <2 x double> %split, i64 0 %2 = insertelement <2 x double> undef, double %1, i64 0 %3 = extractelement <2 x double> %split1, i64 0 %4 = insertelement <2 x double> %2, double %3, i64 1 %5 = extractelement <2 x double> %split, i64 1 %6 = insertelement <2 x double> undef, double %5, i64 0 %7 = extractelement <2 x double> %split1, i64 1 %8 = insertelement <2 x double> %6, double %7, i64 1 %9 = shufflevector <2 x double> %4, <2 x double> %8, <4 x i32> <i32 0, i32 1, i32 2, i32 3> store <4 x double> %9, <4 x double>* %Ptr With this patch, we propagate the 2x2 shape information from the transpose to the store and we generate the code below. Note that we store the columns directly and do not need an extra shuffle. %9 = bitcast <4 x double>* %Ptr to double* %10 = bitcast double* %9 to <2 x double>* store <2 x double> %4, <2 x double>* %10, align 8 %11 = getelementptr double, double* %9, i32 2 %12 = bitcast double* %11 to <2 x double>* store <2 x double> %8, <2 x double>* %12, align 8 Reviewers: anemet, Gerolf, reames, hfinkel, andrew.w.kaylor Reviewed By: anemet Differential Revision: https://reviews.llvm.org/D70897
* [yaml2obj] - Allow using an arbitrary value for OSABI.Georgii Rymar2019-12-231-0/+1
| | | | | | | There was no way to set an unsupported or unknown OS ABI. With this patch it is possible to use any numeric value. Differential revision: https://reviews.llvm.org/D71765
* [yaml2obj] - Add support for ELFOSABI_LINUX.Georgii Rymar2019-12-231-0/+1
| | | | | | | ELFOSABI_LINUX is an alias for ELFOSABI_GNU. It is not that obvious probably. Differential revision: https://reviews.llvm.org/D71764
* [AArch64] [Windows] Use COFF stubs for calls to extern_weak functionsMartin Storsjö2019-12-234-8/+18
| | | | | | | | | | | | | | | | | | | As the extern_weak target might be missing, resolving to the absolute address zero, we can't use the normal direct PC-relative branch instructions (as that would result in relocations out of range). Improve the classifyGlobalFunctionReference method to set MO_DLLIMPORT/MO_COFFSTUB, and simplify the existing code in AArch64TargetLowering::LowerCall to use the return value from classifyGlobalFunctionReference for these cases. Add code in both AArch64FastISel and GlobalISel/IRTranslator to bail out for function calls to extern weak functions on windows, to let SelectionDAG handle them. This matches what was done for X86 in 6bf108d77a3c. Differential Revision: https://reviews.llvm.org/D71721
* [ARM] [Windows] Use COFF stubs for calls to extern_weak functionsMartin Storsjö2019-12-231-4/+6
| | | | | | | | | | | | | As the extern_weak target might be missing, resolving to the absolute address zero, we can't use the normal direct PC-relative branch instructions (as that would result in relocations out of range). Instead check the shouldAssumeDSOLocal method and load the address from a COFF stub. This matches what was done for X86 in 6bf108d77a3c. Differential Revision: https://reviews.llvm.org/D71720
* [NFC] Style cleanupsShengchen Kan2019-12-232-33/+34
| | | | | | 1. Remove duplicate function for class name at the beginning of the comment. 2. Use auto where the type is already obvious from the context.
* [Power9] Remove the PPCISD::XXREVERSE as it has completely the same ↵QingShan Zhang2019-12-234-23/+5
| | | | | | | | | semantics of ISD::BSWAP The custom node PPCISD::XXREVERSE has completely the same semantics of generic node ISD::BSWAP. We need to clean up it as we have the combine rules for bswap in the base class, while nothing for xxreverse. Differential Revision: https://reviews.llvm.org/D70657
* Fix case style warnings in DIBuilder. NFC.Simon Pilgrim2019-12-231-8/+8
|
* [SLP] Replace NeedToGather variable with enum.Dinar Temirbulatov2019-12-231-22/+34
|
* [AVR] Fix codegen for rotate instructionsJim Lin2019-12-233-4/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch introduces the ROLBRd and RORBRd pseudo-instructions, which implemenent the "traditional" rotate operations; instead of the AVR rotate instructions that use the carry bit. The code is not optimized at all. Especially when dealing with loops of rotate instructions, this codegen should be improved some day. Related bug: 41358 <https://bugs.llvm.org/show_bug.cgi?id=41358> //Note//: This is my first submitted patch. Reviewers: dylanmckay, Jim Reviewed By: dylanmckay Subscribers: hiraditya, llvm-commits, dylanmckay, dsprenkels Tags: #llvm Patched by dsprenkels (Daan Sprenkels) Differential Revision: https://reviews.llvm.org/D60365
* [PowerPC] Exploit `vrl(b|h|w|d)` to perform vector rotationKai Luo2019-12-232-1/+21
| | | | | | | | | Summary: Currently, we set legalization action of `ISD::ROTL` vectors as `Expand` in `PPCISelLowering`. However, we can exploit `vrl(b|h|w|d)` to lower `ISD::ROTL` directly. Differential Revision: https://reviews.llvm.org/D71324
* reland "[DebugInfo] Support to emit debugInfo for extern variables"Yonghong Song2019-12-222-3/+4
| | | | | | | | | | | | | Commit d77ae1552fc21a9f3877f3ed7e13d631f517c825 ("[DebugInfo] Support to emit debugInfo for extern variables") added deebugInfo for extern variables for BPF target. The commit is reverted by 891e25b02d760d0de18c7d46947913b3166047e7 as the committed tests using %clang instead of %clang_cc1 causing test failed in certain scenarios as reported by Reid Kleckner. This patch fixed the tests by using %clang_cc1. Differential Revision: https://reviews.llvm.org/D71818
* [DAGCombiner] Check term use before applying aggressive FSUB optimisationsCarl Ritson2019-12-231-3/+6
| | | | | | | | | | | | | | | | Summary: Without this check unnecessary FMA instructions are generated when the FSUB terms are reused. This also has the side-effect that the same value is computed to different levels of precision, which can create undesirable effects if the results are used together in subsequent computation. Reviewers: arsenm, nhaehnle, foad, tpr, dstuttard, spatel Reviewed By: arsenm Subscribers: jvesely, wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71656
* Revert "[DebugInfo] Support to emit debugInfo for extern variables"Reid Kleckner2019-12-222-4/+3
| | | | | | | This reverts commit d77ae1552fc21a9f3877f3ed7e13d631f517c825. The tests committed along with this change do not pass, and should be changed to use %clang_cc1.
* [SelectionDAG] Copy FP flags when visiting a binary instruction.Valentin Churavy2019-12-221-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: We noticed in Julia that the sequence below no longer turned into a sequence of FMA instructions in LLVM 7+, but it did in LLVM 6. ``` %29 = fmul contract <4 x double> %wide.load, %wide.load16 %30 = fmul contract <4 x double> %wide.load13, %wide.load17 %31 = fmul contract <4 x double> %wide.load14, %wide.load18 %32 = fmul contract <4 x double> %wide.load15, %wide.load19 %33 = fadd fast <4 x double> %vec.phi, %29 %34 = fadd fast <4 x double> %vec.phi10, %30 %35 = fadd fast <4 x double> %vec.phi11, %31 %36 = fadd fast <4 x double> %vec.phi12, %32 ``` Unlike Clang, Julia doesn't set the `unsafe-fp-math=true` function attribute, but rather emits more local instruction flags. This partially undoes https://reviews.llvm.org/D46854 and if required I can try to minimize the test further. Reviewers: spatel, mcberg2017 Reviewed By: spatel Subscribers: chriselrod, merge_guards_bot, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71495
* Revert "[ARM][TypePromotion] Enable by default"Reid Kleckner2019-12-221-21/+3
| | | | | | | | This reverts commit ee7579409b7d940c4e1314d126e900db30c4edff. It causes crashes during ThinLTO. I suspect the issue is related to races on the global TypeSize variable, which is 80 at the time of the crash.
* [AMDGPU] Fixes -Wrange-loop-analysis warningsMark de Wever2019-12-222-4/+4
| | | | | | This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall. Differential Revision: https://reviews.llvm.org/D71815
* [Hexagon] Fixes -Wrange-loop-analysis warningsMark de Wever2019-12-225-10/+10
| | | | | | This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall. Differential Revision: https://reviews.llvm.org/D71814
* [NVPTX] Fixes -Wrange-loop-analysis warningsMark de Wever2019-12-221-1/+1
| | | | | | | | | This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall. Also removed the top-level const as requested by Aaron Ballman in similar patches. Differential Revision: https://reviews.llvm.org/D71812
* [PowerPC] Fixes -Wrange-loop-analysis warningsMark de Wever2019-12-221-3/+3
| | | | | | This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall. Differential Revision: https://reviews.llvm.org/D71811
* [Transforms] Fixes -Wrange-loop-analysis warningsMark de Wever2019-12-225-6/+6
| | | | | | This avoids new warnings due to D68912 adds -Wrange-loop-analysis to -Wall. Differential Revision: https://reviews.llvm.org/D71810
* [InstCombine] enhance fold for copysign with known sign argSanjay Patel2019-12-221-8/+12
| | | | | This is another optimization suggested in PRPR44153: https://bugs.llvm.org/show_bug.cgi?id=44153
* [ms] [X86] Use "P" modifier on operands to call instructions in inline X86 ↵Eric Astor2019-12-226-18/+81
| | | | | | | | | | | | | | | | | | | | assembly. Summary: This is documented as the appropriate template modifier for call operands. Fixes PR44272, and adds a regression test. Also adds support for operand modifiers in Intel-style inline assembly. Reviewers: rnk Reviewed By: rnk Subscribers: merge_guards_bot, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D71677
* [AArch64] match splat of bitcasted extract subvector to DUPLANESanjay Patel2019-12-221-7/+43
| | | | | | | | | | This is another potential regression exposed by D63815. Here we peek through a bitcast to find an extract subvector and scale the splat offset based on that: splat (bitcast (extract X, C)), LaneC --> duplane (bitcast X), LaneC' Differential Revision: https://reviews.llvm.org/D71672
* DebugInfo: Remove out of date commentDavid Blaikie2019-12-211-4/+0
|
* Fix "result of 32-bit shift implicitly converted to 64 bits" warning. NFC.Simon Pilgrim2019-12-211-1/+1
|
* [InstCombine] check alloc size in bitcast of geps fold (PR44321)Sanjay Patel2019-12-211-4/+6
| | | | | | | | | | We missed a constraint in D44833 when folding a bitcast into a GEP with vector/array types. If the alloc sizes specified by the datalayout don't match, this could miscompile as shown in: https://bugs.llvm.org/show_bug.cgi?id=44321 Differential Revision: https://reviews.llvm.org/D71771
* [SimplifyLibCalls] require fast-math-flags for pow(X, -0.5) transformsSanjay Patel2019-12-211-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | As discussed in PR44330: https://bugs.llvm.org/show_bug.cgi?id=44330 ...the transform from pow(X, -0.5) libcall/intrinsic to reciprocal square root can result in small deviations from the expected result due to differences in the pow() implementation and/or the extra rounding step from the division. This patch proposes to allow that difference with either the 'approximate functions' or 'reassociate' FMF: http://llvm.org/docs/LangRef.html#fast-math-flags In practice, this likely means that the code is compiled with all of 'fast' (-ffast-math), but I have preserved the existing specializations for -0.0/-INF that enable generating safe code if those special values are allowed simultaneously with allowing approximation/reassociation. The question about whether a similar restriction is needed for the non-reciprocal case -- pow(X, 0.5) -- is deferred. That transform is allowed without FMF currently, and this patch does not change that behavior. Differential Revision: https://reviews.llvm.org/D71706
* [AArch64] Respect reserved registers while renaming in LdSt opt.Florian Hahn2019-12-211-1/+4
| | | | | | We cannot pick reserved registers as rename registers. Fixes https://bugs.llvm.org/show_bug.cgi?id=44358
* AMDGPU/GlobalISel: Fix misuse of div_scale intrinsicsMatt Arsenault2019-12-211-5/+5
| | | | | | | | | | | Confusingly, the intrinsic operands do not match the instruction/custom node. The order is shuffled, and the 3rd operand is an immediate to select operands. I'm not 100% sure I did this right, but fdiv still doesn't select end to end and it will be easier to tell when it does. This at least avoids an assertion in RegBankSelect and allows hitting the fallback on selection.
* AMDGPU/GlobalISel: Fix missing scc imp-def on scalar and/or/xorMatt Arsenault2019-12-211-0/+5
|
* AMDGPU/GlobalISel: Simplify codeMatt Arsenault2019-12-211-5/+5
| | | | | This can directly access the register bank, and doesn't need to get it through the ID.
* [ORC] De-register eh-frames in the RTDyldObjectLinkingLayer destructor.Lang Hames2019-12-201-0/+6
| | | | | This matches the behavior of the legacy layer, which automatically deregistered frames.
* [NFC][MachineOutliner] Rewrite setSuffixIndices to be iterativeJessica Paquette2019-12-201-18/+25
| | | | | | | | Having this function be recursive could use up way too much stack space. Rewrite it as an iterative traversal in the tree instead to prevent this. Fixes PR44344.
* [DWARF] Defer creating declaration DIEs until we prepare call site infoVedant Kumar2019-12-201-7/+0
| | | | | | | | It isn't necessary to create DIEs for all of the declaration subprograms in a CU's retainedTypes list. We can defer creating these subprograms until we need to prepare a call site tag that refers to one. This cleanup was mentioned in passing in D70350.
* Reland: [DWARF] Allow cross-CU references of subprogram definitionsVedant Kumar2019-12-204-7/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows a call site tag in CU A to reference a callee DIE in CU B without resorting to creating an incomplete duplicate DIE for the callee inside of CU A. We already allow cross-CU references of subprogram declarations, so it doesn't seem like definitions ought to be special. This improves entry value evaluation and tail call frame synthesis in the LTO setting. During LTO, it's common for cross-module inlining to produce a call in some CU A where the callee resides in a different CU, and there is no declaration subprogram for the callee anywhere. In this case llvm would (unnecessarily, I think) emit an empty DW_TAG_subprogram in order to fill in the call site tag. That empty 'definition' defeats entry value evaluation etc., because the debugger can't figure out what it means. As a follow-up, maybe we could add a DWARF verifier check that a DW_TAG_subprogram at least has a DW_AT_name attribute. Update: Reland with a fix to create a declaration DIE when the declaration is missing from the CU's retainedTypes list. The declaration is left out of the retainedTypes list in two cases: 1) Re-compiling pre-r266445 bitcode (in which declarations weren't added to the retainedTypes list), and 2) Doing LTO function importing (which doesn't update the retainedTypes list). It's possible to handle (1) and (2) by modifying the retainedTypes list (in AutoUpgrade, or in the LTO importing logic resp.), but I don't see an advantage to doing it this way, as it would cause more DWARF to be emitted compared to creating the declaration DIEs lazily. Tested with a stage2 ThinLTO+RelWithDebInfo build of clang, and with a ReleaseLTO-g build of the test suite. rdar://46577651, rdar://57855316, rdar://57840415 Differential Revision: https://reviews.llvm.org/D70350
* [WebAssembly] Use TargetIndex operands in DbgValue to track WebAssembly ↵Yury Delendik2019-12-2012-3/+96
| | | | | | | | | | | | | | | | | | operands locations Extends DWARF expression language to express locals/globals locations. (via target-index operands atm) (possible variants are: non-virtual registers or address spaces) The WebAssemblyExplicitLocals can replace virtual registers to targertindex operand type at the time when WebAssembly backend introduces {get,set,tee}_local instead of corresponding virtual registers. Reviewed By: aprantl, dschuff Tags: #debug-info, #llvm Differential Revision: https://reviews.llvm.org/D52634
* [InstCombine] Improve infinite loop detectionJakub Kuderski2019-12-201-4/+15
| | | | | | | | | | | | | | | | | | | Summary: This patch limits the default number of iterations performed by InstCombine. It also exposes a new option that allows to specify how many iterations is considered getting stuck in an infinite loop. Based on experiments performed on real-world C++ programs, InstCombine seems to perform at most ~8-20 iterations, so treating 1000 iterations as an infinite loop seems like a safe choice. See D71145 for details. The two limits can be specified via command line options. Reviewers: spatel, lebedev.ri, nikic, xbolva00, grosser Reviewed By: spatel Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71673
* Rename DW_AT_LLVM_isysroot to DW_AT_LLVM_sysrootAdrian Prantl2019-12-207-19/+19
| | | | | | | | | | | | This is a purely cosmetic change that is NFC in terms of the binary output. I bugs me that I called the attribute DW_AT_LLVM_isysroot since the "i" is an artifact of GCC command line option syntax (-isysroot is in the category of -i options) and doesn't carry any useful information otherwise. This attribute only appears in Clang module debug info. Differential Revision: https://reviews.llvm.org/D71722
* Add parentheses to silence warningBill Wendling2019-12-201-2/+2
|
* More style cleanups following rG14fc20ca6282 [NFC]Philip Reames2019-12-201-34/+28
| | | | | | | Demote member functions to static functions where possible Use early continue/early return to reduce nesting Clarify comments slightly. Reuse previously define expression in one case.
* Fix a memory leak introduced w/the instruction padding support in rG14fc20ca6282Philip Reames2019-12-201-6/+6
| | | | Should have caught this in review, but only noticed when addressing post commit style items. We were creating a new instance of the X86MCInstrInfo class, and then never reclaiming the memory. This wasn't even conditional on the new off by default flags, so it was an unconditional leak.
* Align branches within 32-Byte boundary (NOP padding)Philip Reames2019-12-204-1/+390
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | WARNING: If you're looking at this patch because you're looking for a full performace mitigation of the Intel JCC Erratum, this is not it! This is a preliminary patch on the patch towards mitigating the performance regressions caused by Intel's microcode update for Jump Conditional Code Erratum. For context, see: https://www.intel.com/content/www/us/en/support/articles/000055650.html The patch adds the required assembler infrastructure and command line options needed to exercise the logic for INTERNAL TESTING. These are NOT public flags, and should not be used for anything other than LLVM's own testing/debugging purposes. They are likely to change both in spelling and meaning. WARNING: This patch is knowingly incorrect in some cornercases. We need, and do not yet provide, a mechanism to selective enable/disable the padding. Conversation on this will continue in parellel with work on extending this infrastructure to support prefix padding. The goal here is to have the assembler align specific instructions such that they neither cross or end at a 32 byte boundary. The impacted instructions are: a. Conditional jump. b. Fused conditional jump. c. Unconditional jump. d. Indirect jump. e. Ret. f. Call. The new options for llvm-mc are: -x86-align-branch-boundary=NUM aligns branches within NUM byte boundary. -x86-align-branch=TYPE[+TYPE...] specifies types of branches to align. A new MCFragment type, MCBoundaryAlignFragment, is added, which may emit NOP to align the fused/unfused branch. alignBranchesBegin inserts MCBoundaryAlignFragment before instructions, alignBranchesEnd marks the end of the branch to be aligned, relaxBoundaryAlign grows or shrinks sizes of NOP to align the target branch. Nop padding is disabled when the instruction may be rewritten by the linker, such as TLS Call. Process Note: I am landing a patch by skan as it has been LGTMed, and continuing to iterate on the review is simply slowing us down at this point. We can and will continue to iterate in tree. Patch By: skan Differential Revision: https://reviews.llvm.org/D70157
* [PPC32] Emit R_PPC_PLTREL24 for calls to dso_local ifuncFangrui Song2019-12-201-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | static void *ifunc(void) __attribute__((ifunc("resolver"))); void foo() { ifunc(); } The relocation produced by the ifunc() call: 1. gcc -msecure-plt -fPIC => R_PPC_PLTREL24 r_addend=0x8000 2. gcc -msecure-plt -PIE => R_PPC_PLTREL24 r_addend=0x8000 3. clang -msecure-plt -fPIC => R_PPC_PLTREL24 r_addend=0x8000 4. clang -msecure-plt -fPIE => R_PPC_REL24 4 is incorrect. The R_PPC_REL24 needs a call stub due to ifunc. If this relocation is mixed with other R_PPC_PLTREL24(r_addend=0x8000) in a function, both GNU ld and lld (after D71621 fix) may produce a wrong result. This patch fixes 4 to use R_PPC_PLTREL24, which matches GCC. Both GNU ld and lld (after D71621) will be happy. Reviewed By: sfertile Differential Revision: https://reviews.llvm.org/D71649
* [X86] Fix a KNL miscompile caused by combineSetCC swapping LHS/RHS variables ↵Craig Topper2019-12-201-19/+23
| | | | | | | | | | | | | | before a later use. The setcc operands are copied into LHS and RHS variables at the top of the function. We also capture the condition code. A later piece of code swaps the operands and changing the CC variable as part of a canonicalization to make some other checks simpler. But we might not make the transform we canonicalized for. So we continue on through the function where we can use the swapped LHS/RHS variables and access the original condition code operand instead of the modified CC variable. This leads to a setcc being created with the original condition code, but with swapped operands. To mitigate this, this patch does a couple things. The LHS/RHS/CC variables are made const to keep them from being modified like this again. The transform that needs the swap now uses temporary copies of the variables. And the transform that used the original condition code operand has been altered to use the CC variable we cached originally. Either of these changes are enough to fix the issue, but doing both to make this code very safe. I also considered rewriting the swap code in some way to check both permutations without explicitly swapping or needing temporary variables, but held off on that. Differential Revision: https://reviews.llvm.org/D71736
OpenPOWER on IntegriCloud