summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* dwarfgen: Add support for generating the debug_str_offsets sectionPavel Labath2018-07-255-27/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The motivation for this is D49493, where we'd like to test details of debug_str_offsets behavior which is difficult to trigger from a traditional test. This adds the plubming necessary for dwarfgen to generate this section. The more interesting changes are: - I've moved emitStringOffsetsTableHeader function from DwarfFile to DwarfStringPool, so I can generate the section header more easily from the unit test. - added a new addAttribute overload taking an MCExpr*. This is used to generate the DW_AT_str_offsets_base, which links a compile unit to the offset table. I've also added a basic test for reading and writing DW_form_strx forms. Reviewers: dblaikie, JDevlieghere, probinson Subscribers: llvm-commits, aprantl Differential Revision: https://reviews.llvm.org/D49670 llvm-svn: 337910
* [SystemZ] Use tablegen loops in SchedModelsJonas Paulsson2018-07-255-229/+98
| | | | | | | | | | NFC changes to make scheduler TableGen files more readable, by using loops instead of a lot of similar defs with just e.g. a latency value that changes. https://reviews.llvm.org/D49598 Review: Ulrich Weigand, Javed Abshar llvm-svn: 337909
* Recommit r333268: [IPSCCP] Use PredicateInfo to propagate facts from cmp ↵Florian Hahn2018-07-252-10/+140
| | | | | | | | | | | | | | | | | | | | instructions. r337828 resolves a PredicateInfo issue with unnamed types. Original message: This patch updates IPSCCP to use PredicateInfo to propagate facts to true branches predicated by EQ and to false branches predicated by NE. As a follow up, we should be able to extend it to also propagate additional facts about nonnull. Reviewers: davide, mssimpso, dberlin, efriedma Reviewed By: davide, dberlin llvm-svn: 337904
* Fix PR34170: Crash on inline asm with 64bit output in 32bit GPRThomas Preud'homme2018-07-251-20/+36
| | | | | | | | Add support for inline assembly with output operand that do not naturally go in the register class it is constrained to (eg. double in a 32-bit GPR as in the PR). llvm-svn: 337903
* [llvm-objdump] Add dynamic section printing to private-headers optionPaul Semel2018-07-251-0/+138
| | | | | | Differential Revision: https://reviews.llvm.org/D49016 llvm-svn: 337902
* [x86/SLH] Sink the return hardening into the main block-walk + hardeningChandler Carruth2018-07-251-26/+17
| | | | | | | | | | | code. This consolidates all our hardening calls, and simplifies the code a bit. It seems much more clear to handle all of these together. No functionality changed here. llvm-svn: 337895
* [x86/SLH] Improve name and comments for the main hardening function.Chandler Carruth2018-07-251-174/+190
| | | | | | | | | | | | | | | | | | | This function actually does two things: it traces the predicate state through each of the basic blocks in the function (as that isn't directly handled by the SSA updater) *and* it hardens everything necessary in the block as it goes. These need to be done together so that we have the currently active predicate state to use at each point of the hardening. However, this also made obvious that the flag to disable actual hardening of loads was flawed -- it also disabled tracing the predicate state across function calls within the body of each block. So this patch sinks this debugging flag test to correctly guard just the hardening of loads. Unless load hardening was disabled, no functionality should change with tis patch. llvm-svn: 337894
* [mips] Replace custom parsing logic for data directives by the ↵Simon Atanasyan2018-07-253-42/+12
| | | | | | | | | | | | | | | | | | | | | `addAliasForDirective` The target independent AsmParser doesn't recognise .hword, .word, .dword which are required for Mips. Currently MipsAsmParser recognises these through dispatch to MipsAsmParser::parseDataDirective. This contains equivalent logic to AsmParser::parseDirectiveValue. This patch allows reuse of AsmParser::parseDirectiveValue by making use of addAliasForDirective to support .hword, .word and .dword. Original patch provided by Alex Bradbury at D47001 was modified to fix handling of microMIPS symbols. The `AsmParser::parseDirectiveValue` calls either `EmitIntValue` or `EmitValue`. In this patch we override `EmitIntValue` in the `MipsELFStreamer` to clear a pending set of microMIPS symbols. Differential revision: https://reviews.llvm.org/D49539 llvm-svn: 337893
* [Dominators] Assert if there is modification to DelBB while it is awaiting ↵Chijun Sima2018-07-251-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | deletion Summary: Previously, passes use ``` DomTreeUpdater DTU(DT, DomTreeUpdater::UpdateStrategy::Lazy); DTU.deleteBB(DelBB); ``` to delete a BasicBlock. But passes which don't have the ability to update DomTree (e.g. tailcallelim, simplifyCFG) cannot recognize a DelBB awaiting deletion and will continue to process this DelBB. This is a simple approach to notify devs of passes which may use DTU in the future to deal with deleted BasicBlocks under Lazy Strategy correctly. Reviewers: kuhar, brzycki, dmgreen Reviewed By: kuhar Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49731 llvm-svn: 337891
* [X86] Use X86ISD::MUL_IMM instead of ISD::MUL for multiply we intend to be ↵Craig Topper2018-07-251-1/+2
| | | | | | | | selected to LEA. This prevents other combines from possibly disturbing it. llvm-svn: 337890
* [RegisterBankInfo] Ignore InstrMappings that create impossible to repair ↵Tom Stellard2018-07-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | operands Summary: This is a follow-up to r303043. In computeMapping(), we need to disqualify an InstrMapping if it would be impossible to repair one of the registers in the instruction to match the mapping. This change is needed in order to be able to define an instruction mapping for G_SELECT for the AMDGPU target and will be tested by test/CodeGen/AMDGPU/GlobalISel/regbankselect-select.mir Reviewers: ab, qcolombet, t.p.northover, dsanders Reviewed By: qcolombet Subscribers: tpr, llvm-commits Differential Revision: https://reviews.llvm.org/D49735 llvm-svn: 337882
* [profile] Support profiling runtime on FuchsiaPetr Hosek2018-07-251-0/+1
| | | | | | | | | | | | | This ports the profiling runtime on Fuchsia and enables the instrumentation. Unlike on other platforms, Fuchsia doesn't use files to dump the instrumentation data since on Fuchsia, filesystem may not be accessible to the instrumented process. We instead use the data sink to pass the profiling data to the system the same sanitizer runtimes do. Differential Revision: https://reviews.llvm.org/D47208 llvm-svn: 337881
* [x86/SLH] Teach the x86 speculative load hardening pass to hardenChandler Carruth2018-07-251-0/+200
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | against v1.2 BCBS attacks directly. Attacks using spectre v1.2 (a subset of BCBS) are described in the paper here: https://people.csail.mit.edu/vlk/spectre11.pdf The core idea is to speculatively store over the address in a vtable, jumptable, or other target of indirect control flow that will be subsequently loaded. Speculative execution after such a store can forward the stored value to subsequent loads, and if called or jumped to, the speculative execution will be steered to this potentially attacker controlled address. Up until now, this could be mitigated by enableing retpolines. However, that is a relatively expensive technique to mitigate this particular flavor. Especially because in most cases SLH will have already mitigated this. To fully mitigate this with SLH, we need to do two core things: 1) Unfold loads from calls and jumps, allowing the loads to be post-load hardened. 2) Force hardening of incoming registers even if we didn't end up needing to harden the load itself. The reason we need to do these two things is because hardening calls and jumps from this particular variant is importantly different from hardening against leak of secret data. Because the "bad" data here isn't a secret, but in fact speculatively stored by the attacker, it may be loaded from any address, regardless of whether it is read-only memory, mapped memory, or a "hardened" address. The only 100% effective way to harden these instructions is to harden the their operand itself. But to the extent possible, we'd like to take advantage of all the other hardening going on, we just need a fallback in case none of that happened to cover the particular input to the control transfer instruction. For users of SLH, currently they are paing 2% to 6% performance overhead for retpolines, but this mechanism is expected to be substantially cheaper. However, it is worth reminding folks that this does not mitigate all of the things retpolines do -- most notably, variant #2 is not in *any way* mitigated by this technique. So users of SLH may still want to enable retpolines, and the implementation is carefuly designed to gracefully leverage retpolines to avoid the need for further hardening here when they are enabled. Differential Revision: https://reviews.llvm.org/D49663 llvm-svn: 337878
* [X86] Use a shift plus an lea for multiplying by a constant that is a power ↵Craig Topper2018-07-251-0/+18
| | | | | | | | of 2 plus 2/4/8. The LEA allows us to combine an add and the multiply by 2/4/8 together so we just need a shift for the larger power of 2. llvm-svn: 337875
* [X86] Expand mul by pow2 + 2 using a shift and two adds similar to what we ↵Craig Topper2018-07-251-11/+15
| | | | | | do for pow2 - 2. llvm-svn: 337874
* [X86] Use a two lea sequence for multiply by 37, 41, and 73.Craig Topper2018-07-241-0/+9
| | | | | | These fit a pattern used by 11, 21, and 19. llvm-svn: 337871
* [X86] Change multiply by 26 to use two multiplies by 5 and an add instead of ↵Craig Topper2018-07-241-7/+7
| | | | | | | | multiply by 3 and 9 and a subtract. Same number of operations, but ending in an add is friendlier due to it being commutable. llvm-svn: 337869
* [LV] Fix for PR38110, LV encountered llvm_unreachable() Hideki Saito2018-07-241-2/+3
| | | | | | | | | | | | | | Summary: truncateToMinimalBitWidths() doesn't handle all Instructions and the worst case is compiler crash via llvm_unreachable(). Fix is to add a case to handle PHINode and changed the worst case to NO-OP (from compiler crash). Reviewers: sbaranga, mssimpso, hsaito Reviewed By: hsaito Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D49461 llvm-svn: 337861
* [SCEV] Add zext(C + x + ...) -> D + zext(C-D + x + ...)<nuw><nsw> transformRoman Tereshin2018-07-241-0/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | if the top level addition in (D + (C-D + x + ...)) could be proven to not wrap, where the choice of D also maximizes the number of trailing zeroes of (C-D + x + ...), ensuring homogeneous behaviour of the transformation and better canonicalization of such expressions. This enables better canonicalization of expressions like 1 + zext(5 + 20 * %x + 24 * %y) and zext(6 + 20 * %x + 24 * %y) which get both transformed to 2 + zext(4 + 20 * %x + 24 * %y) This pattern is common in address arithmetics and the transformation makes it easier for passes like LoadStoreVectorizer to prove that 2 or more memory accesses are consecutive and optimize (vectorize) them. Reviewed By: mzolotukhin Differential Revision: https://reviews.llvm.org/D48853 llvm-svn: 337859
* [X86] When expanding a multiply by a negative of one less than a power of 2, ↵Craig Topper2018-07-241-10/+12
| | | | | | | | | | like 31, don't generate a negate of a subtract that we'll never optimize. We generated a subtract for the power of 2 minus one then negated the result. The negate can be optimized away by swapping the subtract operands, but DAG combine doesn't know how to do that and we don't add any of the new nodes to the worklist anyway. This patch makes use explicitly emit the swapped subtract. llvm-svn: 337858
* [X86] Generalize the multiply by 30 lowering to generic multipy by power 2 ↵Craig Topper2018-07-241-15/+10
| | | | | | | | | | minus 2. Use a left shift and 2 subtracts like we do for 30. Move this out from behind the slow lea check since it doesn't even use an LEA. Use this for multiply by 14 as well. llvm-svn: 337856
* [X86] Change multiply by 19 to use (9 * X) * 2 + X instead of (5 * X) * 4 - 1.Craig Topper2018-07-241-2/+2
| | | | | | The new lowering can be done in 2 LEAs. The old code took 1 LEA, 1 shift, and 1 sub. llvm-svn: 337851
* [MachineOutliner][NFC] Move outlined function remark into its own functionJessica Paquette2018-07-241-31/+33
| | | | | | | This pulls the OutlinedFunction remark out into its own function to make the code a bit easier to read. llvm-svn: 337849
* [MachineOutliner][NFC] Move target frame info into OutlinedFunctionJessica Paquette2018-07-245-26/+27
| | | | | | | | | | | | | | Just some gardening here. Similar to how we moved call information into Candidates, this moves outlined frame information into OutlinedFunction. This allows us to remove TargetCostInfo entirely. Anywhere where we returned a TargetCostInfo struct, we now return an OutlinedFunction. This establishes OutlinedFunctions as more of a general repeated sequence, and Candidates as occurrences of those repeated sequences. llvm-svn: 337848
* Put "built-in" function definitions in global Used list, for LTO. (fix bug ↵Peter Collingbourne2018-07-243-3/+15
| | | | | | | | | | | | | | 34169) When building with LTO, builtin functions that are defined but whose calls have not been inserted yet, get internalized. The Global Dead Code Elimination phase in the new LTO implementation then removes these function definitions. Later optimizations add calls to those functions, and the linker then dies complaining that there are no definitions. This CL fixes the new LTO implementation to check if a function is builtin, and if so, to not internalize (and later DCE) the function. As part of this fix I needed to move the RuntimeLibcalls.{def,h} files from the CodeGen subidrectory to the IR subdirectory. I have updated all the files that accessed those two files to access their new location. Fixes PR34169 Patch by Caroline Tice! Differential Revision: https://reviews.llvm.org/D49434 llvm-svn: 337847
* [x86] Teach the x86 backend that it can fold between TCRETURNm* and ↵Chandler Carruth2018-07-242-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TCRETURNr* and fix latent bugs with register class updates. Summary: Enabling this fully exposes a latent bug in the instruction folding: we never update the register constraints for the register operands when fusing a load into another operation. The fused form could, in theory, have different register constraints on its operands. And in fact, TCRETURNm* needs its memory operands to use tailcall compatible registers. I've updated the folding code to re-constrain all the registers after they are mapped onto their new instruction. However, we still can't enable folding in the general case from TCRETURNr* to TCRETURNm* because doing so may require more registers to be available during the tail call. If the call itself uses all but one register, and the folded load would require both a base and index register, there will not be enough registers to allocate the tail call. It would be better, IMO, to teach the register allocator to *unfold* TCRETURNm* when it runs out of registers (or specifically check the number of registers available during the TCRETURNr*) but I'm not going to try and solve that for now. Instead, I've just blocked the forward folding from r -> m, leaving LLVM free to unfold from m -> r as that doesn't introduce new register pressure constraints. The down side is that I don't have anything that will directly exercise this. Instead, I will be immediately using this it my SLH patch. =/ Still worse, without allowing the TCRETURNr* -> TCRETURNm* fold, I don't have any tests that demonstrate the failure to update the memory operand register constraints. This patch still seems correct, but I'm nervous about the degree of testing due to this. Suggestions? Reviewers: craig.topper Subscribers: sanjoy, mcrosier, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D49717 llvm-svn: 337845
* [Inliner] Teach inliner to merge 'min-legal-vector-width' function attributeCraig Topper2018-07-241-0/+27
| | | | | | | | | | | | When we inline a function with a min-legal-vector-width attribute we need to make sure the caller also ends up with at least that vector width. This patch is necessary to make always_inline functions like intrinsics propagate their min-legal-vector-width. Though nothing uses min-legal-vector-width yet. A future patch will add heuristics to preventing inlining with different vector width mismatches. But that code would need to be in inline cost analysis which is separate from the code added here. Differential Revision: https://reviews.llvm.org/D49162 llvm-svn: 337844
* [MachineOutliner][NFC] Make Candidates own their call informationJessica Paquette2018-07-245-42/+56
| | | | | | | | | | | | | Before this, TCI contained all the call information for each Candidate. This moves that information onto the Candidates. As a result, each Candidate can now supply how it ought to be called. Thus, Candidates will be able to, say, call the same function in cheaper ways when possible. This also removes that information from TCI, since it's no longer used there. A follow-up patch for the AArch64 outliner will demonstrate this. llvm-svn: 337840
* [MachineOutliner][NFC] Move missed opt remark into its own functionJessica Paquette2018-07-241-39/+46
| | | | | | | | Having the missed remark code in the middle of `findCandidates` made the function hard to follow. This yanks that out into a new function, `emitNotOutliningCheaperRemark`. llvm-svn: 337839
* [MachineOutliner][NFC] Sink some candidate logic into OutlinedFunctionJessica Paquette2018-07-241-15/+6
| | | | | | | | | | | | | | | | | Just some simple gardening to improve clarity. Before, we had something along the lines of 1) Create a std::vector of Candidates 2) Create an OutlinedFunction 3) Create a std::vector of pointers to Candidates 4) Copy those over to the OutlinedFunction and the Candidate list Now, OutlinedFunctions create the Candidate pointers. They're still copied over to the main list of Candidates, but it makes it a bit clearer what's going on. llvm-svn: 337838
* Use SCEV to avoid inserting some bounds checks.Joel Galenson2018-07-241-12/+28
| | | | | | | | This patch uses SCEV to avoid inserting some bounds checks when they are not needed. This slightly improves the performance of code compiled with the bounds check sanitizer. Differential Revision: https://reviews.llvm.org/D49602 llvm-svn: 337830
* [PredicateInfo] Use custom mangling to support ssa_copy with unnamed types.Florian Hahn2018-07-241-6/+56
| | | | | | | | | | | | | | | | | | This is a workaround and it would be better to fix this generally, but doing it generally is quite tricky. See D48541 and PR38117. Doing it in PredicateInfo directly allows us to use the type address to differentiate different unnamed types, because neither the created declarations nor the ssa_copy calls should be visible after PredicateInfo got destroyed. Reviewers: efriedma, davide Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D49126 llvm-svn: 337828
* [mips] Fix local dynamic TLS with Sym64Simon Atanasyan2018-07-246-22/+20
| | | | | | | | | | | | | | | | For the final DTPREL addition, rather than a lui/daddiu/daddu triple, LLVM was erronously emitting a daddiu/daddiu pair, treating the %dtprel_hi as if it were a %dtprel_lo, since Mips::Hi expands unshifted for Sym64. Instead, use a new TlsHi node and, although unnecessary due to the exact structure of the nodes emitted, use TlsHi for local exec too to prevent future bugs. Also garbage-collect the unused TprelLo and TlsGd nodes, and TprelHi since its functionality is provided by the new common TlsHi node. Patch by James Clarke. Differential revision: https://reviews.llvm.org/D49259 llvm-svn: 337827
* [x86/SLH] Extract the core register hardening logic to a low-levelChandler Carruth2018-07-241-36/+73
| | | | | | | | | | | | | | | | | helper and restructure the post-load hardening to use this. This isn't as trivial as I would have liked because the post-load hardening used a trick that only works for it where it swapped in a temporary register to the load rather than replacing anything. However, there is a simple way to do this without that trick that allows this to easily reuse a friendly API for hardening a value in a register. That API will in turn be usable in subsequent patcehs. This also techincally changes the position at which we insert the subreg extraction for the predicate state, but that never resulted in an actual instruction and so tests don't change at all. llvm-svn: 337825
* [x86/SLH] Tidy up a comment, using doxygen structure and wording it toChandler Carruth2018-07-241-5/+7
| | | | | | be more accurate and understandable. llvm-svn: 337822
* [ARM] Disable ARMCodeGenPrepare by defaultSam Parker2018-07-241-1/+1
| | | | | | | | ARM Stage 2 builders have been suspiciously broken since the pass was committed. Disabling to hopefully fix the bots and give me time to debug. llvm-svn: 337821
* ADT: Shrink SmallVector size 0 to 16B on 64-bit platformsDuncan P. N. Exon Smith2018-07-241-2/+21
| | | | | | | | | | | | | | | | | | | | | SmallVectorTemplateCommon wants to know the address of the first element so it can detect whether it's in "small size" mode. The old implementation split the small array, creating the storage for the first element in SmallVectorTemplateCommon, and pulling the rest into SmallVectorStorage where we know the size of the array. This bloats SmallVector size 0 by the larger of sizeof(void*) and sizeof(T), and we're not even using the storage. The new implementation leaves the full small storage to SmallVectorStorage. To calculate the offset of the first element in SmallVectorTemplateCommon, we just need to know how far to jump, which we can calculate out-of-band. One subtlety is that we need SmallVectorStorage to be properly aligned even when the size is 0, to be sure that (for large alignments) we actually have the padding and it's well defined to do the pointer math. llvm-svn: 337820
* Revert "[DebugInfo] Generate DWARF debug information for labels."Shiva Chen2018-07-2414-386/+149
| | | | | | This reverts commit b454fa1b4079b6c0a5b1565982d16516385838d7. llvm-svn: 337812
* [DebugInfo] Generate DWARF debug information for labels.Shiva Chen2018-07-2414-149/+386
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two forms for label debug information in DWARF format. 1. Labels in a non-inlined function: DW_TAG_label DW_AT_name DW_AT_decl_file DW_AT_decl_line DW_AT_low_pc 2. Labels in an inlined function: DW_TAG_label DW_AT_abstract_origin DW_AT_low_pc We will collect label information from DBG_LABEL. Before every DBG_LABEL, we will generate a temporary symbol to denote the location of the label. The symbol could be used to get DW_AT_low_pc afterwards. So, we create a mapping between 'inlined label' and DBG_LABEL MachineInstr in DebugHandlerBase. The DBG_LABEL in the mapping is used to query the symbol before it. The AbstractLabels in DwarfCompileUnit is used to process labels in inlined functions. We also keep a mapping between scope and labels in DwarfFile to help to generate correct tree structure of DIEs. Differential Revision: https://reviews.llvm.org/D45556 Patch by Hsiangkai Wang. llvm-svn: 337799
* AMDGPU/GlobalISel: Legalize G_INSERTTom Stellard2018-07-241-1/+1
| | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49601 llvm-svn: 337798
* AMDGPU/GlobalISel: Remove unnecessary legality constraint for G_EXTRACTTom Stellard2018-07-241-3/+0
| | | | | | | | | | | | | | | | | | Summary: We were marking G_EXTRACT operations unsupported if the output type was larger than the input type. I don't see how this could ever actually happen, so I dropped the constraint. Doing this makes it possible to reuse the same legality code for G_INSERT. Reviewers: arsenm Reviewed By: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D49600 llvm-svn: 337794
* Add PerfJITEventListener for perf profiling support.Andres Freund2018-07-245-1/+529
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This new JIT event listener supports generating profiling data for the linux 'perf' profiling tool, allowing it to generate function and instruction level profiles. Currently this functionality is not enabled by default, but must be enabled with LLVM_USE_PERF=yes. Given that the listener has no dependencies, it might be sensible to enable by default once the initial issues have been shaken out. I followed existing precedent in registering the listener by default in lli. Should there be a decision to enable this by default on linux, that should probably be changed. Please note that until https://reviews.llvm.org/D47343 is resolved, using this functionality with mcjit rather than orcjit will not reliably work. Disregarding the previous comment, here's an example: $ cat /tmp/expensive_loop.c bool stupid_isprime(uint64_t num) { if (num == 2) return true; if (num < 1 || num % 2 == 0) return false; for(uint64_t i = 3; i < num / 2; i+= 2) { if (num % i == 0) return false; } return true; } int main(int argc, char **argv) { int numprimes = 0; for (uint64_t num = argc; num < 100000; num++) { if (stupid_isprime(num)) numprimes++; } return numprimes; } $ clang -ggdb -S -c -emit-llvm /tmp/expensive_loop.c -o /tmp/expensive_loop.ll $ perf record -o perf.data -g -k 1 ./bin/lli -jit-kind=mcjit /tmp/expensive_loop.ll 1 $ perf inject --jit -i perf.data -o perf.jit.data $ perf report -i perf.jit.data - 92.59% lli jitted-5881-2.so [.] stupid_isprime stupid_isprime main llvm::MCJIT::runFunction llvm::ExecutionEngine::runFunctionAsMain main __libc_start_main 0x4bf6258d4c544155 + 0.85% lli ld-2.27.so [.] do_lookup_x And line-level annotations also work: │ for(uint64_t i = 3; i < num / 2; i+= 2) { │1 30: movq $0x3,-0x18(%rbp) 0.03 │1 38: mov -0x18(%rbp),%rax 0.03 │ mov -0x10(%rbp),%rcx │ shr $0x1,%rcx 3.63 │ ┌──cmp %rcx,%rax │ ├──jae 6f │ │ if (num % i == 0) 0.03 │ │ mov -0x10(%rbp),%rax │ │ xor %edx,%edx 89.00 │ │ divq -0x18(%rbp) │ │ cmp $0x0,%rdx 0.22 │ │↓ jne 5f │ │ return false; │ │ movb $0x0,-0x1(%rbp) │ │↓ jmp 73 │ │ } 3.22 │1 5f:│↓ jmp 61 │ │ for(uint64_t i = 3; i < num / 2; i+= 2) { Subscribers: mgorny, llvm-commits Differential Revision: https://reviews.llvm.org/D44892 llvm-svn: 337789
* [x86/SLH] Simplify the code for hardening a loaded value. NFC.Chandler Carruth2018-07-241-20/+15
| | | | | | | This is in preparation for extracting this into a re-usable utility in this code. llvm-svn: 337785
* [x86/SLH] Remove complex SHRX-based post-load hardening.Chandler Carruth2018-07-241-73/+10
| | | | | | | | | | | | | | | | | | | This code was really nasty, had several bugs in it originally, and wasn't carrying its weight. While on Zen we have all 4 ports available for SHRX, on all of the Intel parts with Agner's tables, SHRX can only execute on 2 ports, giving it 1/2 the throughput of OR. Worse, all too often this pattern required two SHRX instructions in a chain, hurting the critical path by a lot. Even if we end up needing to safe/restore EFLAGS, that is no longer so bad. We pay for a uop to save the flag, but we very likely get fusion when it is used by forming a test/jCC pair or something similar. In practice, I don't expect the SHRX to be a significant savings here, so I'd like to avoid the complex code required. We can always resurrect this if/when someone has a specific performance issue addressed by it. llvm-svn: 337781
* [DWARF] Use deque in place of SmallVector to fix use-after-free issueFangrui Song2018-07-231-3/+6
| | | | | | | | | | | | Summary: SmallVector's elements are moved when resizing and cause use-after-free. Reviewers: probinson, dblaikie Subscribers: JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D49702 llvm-svn: 337772
* Embed a template specialization in a namespace to work around a gcc bug.Wolfgang Pieb2018-07-231-0/+2
| | | | llvm-svn: 337770
* [DWARF v5] Refactor range lists dumping by using a more generic way of ↵Wolfgang Pieb2018-07-234-190/+132
| | | | | | | | | | | | | handling tables of lists. The intent is to use it for location list tables as well. Change is almost NFC with the exception of the spelling of some strings used during dumping (all lowercase now). Reviewer: JDevlieghere Differential Revision: https://reviews.llvm.org/D49500 llvm-svn: 337763
* [LTO] Handle __imp_ (dllimport) symbols consistently with lldTeresa Johnson2018-07-231-1/+8
| | | | | | | | | | | | | | | | | | | Summary: Similar to what lld already does for dllimport symbols which are prefaced with __imp_ (see lld patch r240620), strip off the __imp_ prefix in LTO. Otherwise we can get 2 separate GlobalResolution for a single symbol, the dllimport declaration, and the definition, which leads to incorrect LTO handling. Fixes PR38105. Reviewers: pcc Subscribers: mehdi_amini, inglorion, steven_wu, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D49138 llvm-svn: 337762
* [demangler] call terminate() if allocation failedErik Pilkington2018-07-232-4/+17
| | | | | | | | | | We really should set *status to memory_alloc_failure, but we need to refactor the demangler a bit to properly propagate the failure up the stack. Until then, its better to explicitly terminate then rely on a null dereference crash. rdar://31240372 llvm-svn: 337759
* [MC] Add a separate flag for skipping comdat constant sections for MinGW. NFC.Martin Storsjo2018-07-232-3/+13
| | | | | | | | | | | This actually has nothing to do with the associative comdat sections that aren't supported by GNU binutils ld. Clarify the comments from SVN r335918 and use a separate flag for it. Differential Revision: https://reviews.llvm.org/D49645 llvm-svn: 337757
OpenPOWER on IntegriCloud