summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [IR] Reformulate LLVM's EH funclet IRDavid Majnemer2015-12-1233-2285/+1165
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While we have successfully implemented a funclet-oriented EH scheme on top of LLVM IR, our scheme has some notable deficiencies: - catchendpad and cleanupendpad are necessary in the current design but they are difficult to explain to others, even to seasoned LLVM experts. - catchendpad and cleanupendpad are optimization barriers. They cannot be split and force all potentially throwing call-sites to be invokes. This has a noticable effect on the quality of our code generation. - catchpad, while similar in some aspects to invoke, is fairly awkward. It is unsplittable, starts a funclet, and has control flow to other funclets. - The nesting relationship between funclets is currently a property of control flow edges. Because of this, we are forced to carefully analyze the flow graph to see if there might potentially exist illegal nesting among funclets. While we have logic to clone funclets when they are illegally nested, it would be nicer if we had a representation which forbade them upfront. Let's clean this up a bit by doing the following: - Instead, make catchpad more like cleanuppad and landingpad: no control flow, just a bunch of simple operands; catchpad would be splittable. - Introduce catchswitch, a control flow instruction designed to model the constraints of funclet oriented EH. - Make funclet scoping explicit by having funclet instructions consume the token produced by the funclet which contains them. - Remove catchendpad and cleanupendpad. Their presence can be inferred implicitly using coloring information. N.B. The state numbering code for the CLR has been updated but the veracity of it's output cannot be spoken for. An expert should take a look to make sure the results are reasonable. Reviewers: rnk, JosephTremoulet, andrew.w.kaylor Differential Revision: http://reviews.llvm.org/D15139 llvm-svn: 255422
* [PowerPC] OutStreamer cleanup in PPCAsmPrinterHal Finkel2015-12-121-23/+19
| | | | | | | | We don't need to pass OutStreamer as a parameter to LowerSTACKMAP and LowerPATCHPOINT. It is a member variable of PPCAsmPrinter, and thus, is already available. NFC. llvm-svn: 255418
* [X86ISelLowering] Add additional support for multiplication-to-shift conversion.Chen Li2015-12-121-3/+25
| | | | | | | | | | | | Summary: This patch adds support of conversion (mul x, 2^N + 1) => (add (shl x, N), x) and (mul x, 2^N - 1) => (sub (shl x, N), x) if the multiplication can not be converted to LEA + SHL or LEA + LEA. LLVM has already supported this on ARM, and it should also be useful on X86. Note the patch currently only applies to cases where the constant operand is positive, and I am planing to add another patch to support negative cases after this. Reviewers: craig.topper, RKSimon Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D14603 llvm-svn: 255415
* [InstCombine] allow any pair of bitcasts to be combinedSanjay Patel2015-12-121-10/+12
| | | | | | | | | | | | | | | | | | | | This change is discussed in D15392 and should allow us to effectively revert: http://llvm.org/viewvc/llvm-project?view=revision&revision=255261 if we canonicalize bitcasts ahead of extracts. It should be safe to convert any pair of bitcasts into a single bitcast, however, it was mentioned here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20110829/127089.html that we're not allowed to bitcast from an x86_mmx to some other types, but I'm not seeing any failures from that, and we have regression tests in CodeGen/X86 that appear to cover all of those cases. Some day we'll get to remove that MMX wart from LLVM IR completely? Differential Revision: http://reviews.llvm.org/D15468 llvm-svn: 255399
* [PowerPC] Add Branch Hints for Highly-Biased BranchesHal Finkel2015-12-122-2/+74
| | | | | | | | | | | This branch adds hints for highly biased branches on the PPC architecture. Even in absence of profiling information, LLVM will mark code reaching unreachable terminators and other exceptional control flow constructs as highly unlikely to be reached. Patch by Tom Jablin! llvm-svn: 255398
* [WebAssembly] Update test expectationsDerek Schuff2015-12-121-108/+49
| | | | | | | | | Many tests are now passing due to eliminateFrameIndex implementation and the list needs to be re-triaged because it unblocks other failures, and some previous failures are different. However I'm about to churn it more by implementing more lowering, so will wait on that. llvm-svn: 255396
* Revert rL255391: [X86ISelLowering] Add additional support for ↵Chen Li2015-12-121-26/+3
| | | | | | | | multiplication-to-shift conversion. because it broke buildbot. llvm-svn: 255395
* [WebAssembly] Implement prolog/epilog insertion and FrameIndex eliminationDerek Schuff2015-12-1111-19/+1233
| | | | | | | | | | | | | | | | | | Summary: Use the SP32 physical register as the base for FrameIndex lowering. Update it and the __stack_pointer global var in the prolog and epilog. Extend the mapping of virtual registers to wasm locals to include the physical registers. Rather than modify the target-independent PrologEpilogInserter (which asserts that there are no virtual registers left) include a slightly-modified copy for Wasm that does not have this assertion and only clears the virtual registers if scavenging was needed (which of course it isn't for wasm). Differential Revision: http://reviews.llvm.org/D15344 llvm-svn: 255392
* [X86ISelLowering] Add additional support for multiplication-to-shift conversion.Chen Li2015-12-111-3/+26
| | | | | | | | | | | | Summary: This patch adds support of conversion (mul x, 2^N + 1) => (add (shl x, N), x) and (mul x, 2^N - 1) => (sub (shl x, N), x) if the multiplication can not be converted to LEA + SHL or LEA + LEA. LLVM has already supported this on ARM, and it should also be useful on X86. Note the patch currently only applies to cases where the constant operand is positive, and I am planing to add another patch to support negative cases after this. Reviewers: craig.topper, RKSimon Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D14603 llvm-svn: 255391
* SamplePGO - Reduce memory utilization by 10x.Diego Novillo2015-12-111-1/+1
| | | | | | | | | | | | | | | DenseMap is the wrong data structure to use for sample records and call sites. The keys are too large, causing massive core memory growth when reading profiles. Before this patch, a 21Mb input profile was causing the compiler to grow to 3Gb in memory. By switching to std::map, the compiler now grows to 300Mb in memory. There still are some opportunities for memory footprint reduction. I'll be looking at those next. llvm-svn: 255389
* SelectionDAG: Match min/max if the scalar operation is legalMatt Arsenault2015-12-112-10/+42
| | | | llvm-svn: 255388
* Revert r248483, r242546, r242545, and r242409 - absdiff intrinsicsHal Finkel2015-12-1110-115/+35
| | | | | | | | | | | | | | | | | | | | After much discussion, ending here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151123/315620.html it has been decided that, instead of having the vectorizer directly generate special absdiff and horizontal-add intrinsics, we'll recognize the relevant reduction patterns during CodeGen. Accordingly, these intrinsics are not needed (the operations they represent can be pattern matched, as is already done in some backends). Thus, we're backing these out in favor of the current development work. r248483 - Codegen: Fix llvm.*absdiff semantic. r242546 - [ARM] Use [SU]ABSDIFF nodes instead of intrinsics for VABD/VABA r242545 - [AArch64] Use [SU]ABSDIFF nodes instead of intrinsics for ABD/ABA r242409 - [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation llvm-svn: 255387
* Avoid buffered reads of /dev/urandomRafael Espindola2015-12-111-4/+9
| | | | | | | | | | | | | | I am seeing disappointing clang performance on a large PowerPC64 Linux box. GetRandomNumberSeed() does a buffered read from /dev/urandom to seed its PRNG. As a result we read an entire page even though we only need 4 bytes. With every clang task reading a page worth of /dev/urandom we end up spending a large amount of time stuck on kernel spinlock. Patch by Anton Blanchard! llvm-svn: 255386
* [PGO] Revert r255365: solution incomplete, not handling lambda yetXinliang David Li2015-12-111-5/+4
| | | | llvm-svn: 255369
* [PGO] Stop using invalid char in instr variable names.Xinliang David Li2015-12-111-4/+5
| | | | | | | | | | | | | | Before the patch, -fprofile-instr-generate compile will fail if no integrated-as is specified when the file contains any static functions (the -S output is also invalid). This patch fixed the issue. With the change, the index format version will be bumped up by 1. Backward compatibility is preserved with this change. Differential Revision: http://reviews.llvm.org/D15243 llvm-svn: 255365
* CodeGen: Redo analyzePhysRegs() and computeRegisterLiveness()Matthias Braun2015-12-116-78/+57
| | | | | | | | | | | | | | | | | | | | computeRegisterLiveness() was broken in that it reported dead for a register even if a subregister was alive. I assume this was because the results of analayzePhysRegs() are hard to understand with respect to subregisters. This commit: Changes the results of analyzePhysRegs (=struct PhysRegInfo) to be clearly understandable, also renames the fields to avoid silent breakage of third-party code (and improve the grammar). Fix all (two) users of computeRegisterLiveness() in llvm: By reenabling it and removing workarounds for the bug. This fixes http://llvm.org/PR24535 and http://llvm.org/PR25033 Differential Revision: http://reviews.llvm.org/D15320 llvm-svn: 255362
* Start replacing vector_extract/vector_insert with extractelt/inserteltMatt Arsenault2015-12-1111-101/+99
| | | | | | | | | | | | | | | | | | | | These are redundant pairs of nodes defined for INSERT_VECTOR_ELEMENT/EXTRACT_VECTOR_ELEMENT. insertelement/extractelement are slightly closer to the corresponding C++ node name, and has stricter type checking so prefer it. Update targets to only use these nodes where it is trivial to do so. AArch64, ARM, and Mips all have various type errors on simple replacement, so they will need work to fix. Example from AArch64: def : Pat<(sext_inreg (vector_extract (v16i8 V128:$Rn), VectorIndexB:$idx), i8), (i32 (SMOVvi8to32 V128:$Rn, VectorIndexB:$idx))>; Which is trying to do sext_inreg i8, i8. llvm-svn: 255359
* [WebAssembly] Fix ADJCALLSTACKDOWN/UP use/defsDerek Schuff2015-12-112-5/+10
| | | | | | | | | | | | | | | | | | | | | | Summary: ADJCALLSTACK{DOWN,UP} (aka CALLSEQ_{START,END}) MIs are supposed to use and def the stack pointer. Since they do not, all the nodes are being eliminated by DeadMachineInstructionElim, so they aren't in the IR when PrologEpilogInserter/eliminateCallFramePseudo needs them. This change fixes that, but since RegStackify will not stackify across them (and it runs early, before PEI), change LowerCall to only emit them when the call frame size is > 0. That makes the current code work the same way and makes code handled by D15344 also work the same way. We can expand the condition beyond NumBytes > 0 in the future if needed. Reviewers: sunfish, jfb Subscribers: jfb, dschuff, llvm-commits Differential Revision: http://reviews.llvm.org/D15459 llvm-svn: 255356
* Revert r255247, r255265, and r255286 due to serious compile-time regressions.Chad Rosier2015-12-111-242/+93
| | | | | | | | Revert "[DSE] Disable non-local DSE to see if the bots go green." Revert "[DeadStoreElimination] Use range-based loops. NFC." Revert "[DeadStoreElimination] Add support for non-local DSE." llvm-svn: 255354
* CXX_FAST_TLS calling convention: target independent portion.Manman Ren2015-12-111-1/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The access function has a short entry and a short exit, the initialization block is only run the first time. To improve the performance, we want to have a short frame at the entry and exit. We explicitly handle most of the CSRs via copies. Only the CSRs that are not handled via copies will be in CSR_SaveList. Frame lowering and prologue/epilogue insertion will generate a short frame in the entry and exit according to CSR_SaveList. The majority of the CSRs will be handled by register allcoator. Register allocator will try to spill and reload them in the initialization block. We add CSRsViaCopy, it will be explicitly handled during lowering. 1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target supports it for the given calling convention and the function has only return exits). We also call TLI->initializeSplitCSR to perform initialization. 2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to virtual registers at beginning of the entry block and copies from virtual registers to CSRsViaCopy at beginning of the exit blocks. 3> we also need to make sure the explicit copies will not be eliminated. rdar://problem/23557469 Differential Revision: http://reviews.llvm.org/D15340 llvm-svn: 255353
* fix typos; NFCSanjay Patel2015-12-111-3/+3
| | | | llvm-svn: 255352
* AlignmentFromAssumptions and SLPVectorizer preserves AA and GlobalsAAHal Finkel2015-12-112-0/+7
| | | | | | | | | | | | GlobalsAA's assumptions that passes do not escape globals not previously escaped is not violated by AlignmentFromAssumptions and SLPVectorizer. Marking them as such allows GlobalsAA to be preserved until GVN in the LTO pipeline. http://lists.llvm.org/pipermail/llvm-dev/2015-December/092972.html Patch by Vaivaswatha Nagaraj! llvm-svn: 255348
* PruneEH pass incorrectly reports that a change was madeArtur Pilipenko2015-12-111-12/+7
| | | | | | | | Reviewed By: reames Differential Revision: http://reviews.llvm.org/D14097 llvm-svn: 255343
* [Mem2Reg] Respect optnoneJames Molloy2015-12-111-0/+3
| | | | | | | | Mem2Reg shouldn't be optimizing a function that is marked optnone. There is a test checking this that fails when mem2reg is explicitly added to the standard pass pipeline. llvm-svn: 255336
* [InstCombine] Make MatchBSwap also match bit reversalsJames Molloy2015-12-112-103/+136
| | | | | | MatchBSwap has most of the functionality to match bit reversals already. If we switch it from looking at bytes to individual bits and remove a few early exits, we can extend the main recursive function to match any sequence of ORs, ANDs and shifts that assemble a value from different parts of another, base value. Once we have this bit->bit mapping, we can very simply detect if it is appropriate for a bswap or bitreverse. llvm-svn: 255334
* [PGO] Read VP raw data without depending on the Value fieldXinliang David Li2015-12-111-3/+9
| | | | | | | | | | | | | | | | Before this patch, each function's on-disk VP data is 'pointed' to by the Value field of per-function ProfileData structue, and read relies on this field (relocated with ValueDataDelta field) to read the value data. However this means the Value field needs to be updated during runtime before dumping, which creates undesirable data races. With this patch, the reading of VP data no longer depends on Value field. There is no format change. ValueDataDelta header field becomes obsolute but will be kept for compatibility reason (will be removed next time the raw format change is needed). llvm-svn: 255329
* Fix build after r255319.Hans Wennborg2015-12-111-1/+1
| | | | llvm-svn: 255322
* [LazyValueInfo] Stop inserting overdefined values into ValueCache toAkira Hatanaka2015-12-111-18/+48
| | | | | | | | | | | | | | | | | | reduce memory usage. Previously, LazyValueInfoCache inserted overdefined lattice values into both ValueCache and OverDefinedCache. This wasn't necessary and was causing LazyValueInfo to use an excessive amount of memory in some cases. This patch changes LazyValueInfoCache to insert overdefined values only into OverDefinedCache. The memory usage decreases by 70 to 75% when one of the files in llvm is compiled. rdar://problem/11388615 Differential revision: http://reviews.llvm.org/D15391 llvm-svn: 255320
* [PPC]: Peephole optimize small accesss to aligned globals.Kyle Butt2015-12-111-9/+26
| | | | | | | | | | | | | | | | | | | | | | | Access to aligned globals gives us a chance to peephole optimize nonzero offsets. If a struct is 4 byte aligned, then accesses to bytes 0-3 won't overflow the available displacement. For example: addis 3, 2, b4v@toc@ha addi 4, 3, b4v@toc@l lbz 5, b4v@toc@l(3) ; This is the result of the current peephole lbz 6, 1(4) ; optimizer lbz 7, 2(4) lbz 8, 3(4) If b4v is 4-byte aligned, we can skip using register 4 because we know that b4v@toc@l+{1,2,3} won't overflow 32K, and instead generate: addis 3, 2, b4v@toc@ha lbz 4, b4v@toc@l(3) lbz 5, b4v@toc@l+1(3) lbz 6, b4v@toc@l+2(3) lbz 7, b4v@toc@l+3(3) Saving a register and an addition. Larger alignments allow larger structures/arrays to be optimized. llvm-svn: 255319
* [ProfileData] clang-format TextInstrProfReader::hasFormat. NFC.Vedant Kumar2015-12-111-2/+3
| | | | llvm-svn: 255317
* [X86][SSE] Update the cost table for integer-integer conversions on SSE2/SSE4.1.Cong Hou2015-12-111-2/+79
| | | | | | | | | | | | Previously in the conversion cost table there are no entries for integer-integer conversions on SSE2. This will result in imprecise costs for certain vectorized operations. This patch adds those entries for SSE2 and SSE4.1. The cost numbers are counted from the result of running llc on the new test case in this patch. Differential revision: http://reviews.llvm.org/D15132 llvm-svn: 255315
* Format fix (NFC)Xinliang David Li2015-12-101-2/+4
| | | | llvm-svn: 255313
* Fix (bitcast (fabs x)), (bitcast (fneg x)) and (bitcast (fcopysign cst,Eric Christopher2015-12-101-0/+68
| | | | | | | | | | | | x)) combines for ppc_fp128, since signbit computation is more complicated. Discussion thread: http://lists.llvm.org/pipermail/llvm-dev/2015-November/092863.html Patch by Tim Shen! llvm-svn: 255305
* PPC: Teach FMA mutate to respect register classes.Kyle Butt2015-12-101-2/+9
| | | | | | | | | This was causing bad code gen and assembly that won't assemble, as mixed altivec and vsx code would end up with a vsx high register assigned to an altivec instruction, which won't work. Constraining the classes allows the optimization to proceed. llvm-svn: 255299
* [LibFuzzer] Introducing FUZZER_FLAG_UNSIGNED and using it for seeding.Mike Aizatsky2015-12-105-9/+25
| | | | | | | | Differential Revision: http://reviews.llvm.org/D15339 done llvm-svn: 255296
* Delete a duplicate branch in IfConversion.cpp. NFC.Cong Hou2015-12-101-9/+0
| | | | llvm-svn: 255291
* [DAGCombiner] Fix PR25763 - vector comparison constant folding + sign-extensionSimon Pilgrim2015-12-101-5/+8
| | | | | | PR25763 demonstrated an issue with D14683 - vector comparison constant folding only works for i1 results, so we need to split off the sign-extension of the result to the required type. Luckily this can be done with the existing type legalization code. llvm-svn: 255289
* [DSE] Disable non-local DSE to see if the bots go green.Chad Rosier2015-12-101-1/+1
| | | | | | I see a few bots timing out, so I'm speculatively disabling r255247. llvm-svn: 255286
* Fix another case where the linkage was not set.Rafael Espindola2015-12-101-1/+1
| | | | llvm-svn: 255272
* Verifier: Avoid quadratic checking of aggregates for bad bitcastsDuncan P. N. Exon Smith2015-12-101-38/+37
| | | | | | | | | | | | | | | | | | | | | | | Avoid O(N^2) behaviour when checking for bad bitcasts in `ConstantExpr`s buried inside of aggregate initializers to `GlobalVariable`s. I've: - centralized the "visited" set for recursing through `ConstantExpr`s so that expressions are only visited once per Verifier run, - removed the duplicate logic for the stack visit, and - avoided recursing into other `GlobalValue`s. This recovers roughly a 100x time difference in clang compiles of a particular input file (filled with large cross-referencing tables) that depends on whether `-disable-llvm-verifier` is on. This slowdown was caused by r187506, which introduced these checks. Now, avoiding `-disable-llvm-verifier` only causes a 2x slowdown for this case. (Interestingly, dumping the textual IR for this file starts at least 50GB of global variable initializers (I don't know the total, since I killed the dump)...) llvm-svn: 255269
* [DeadStoreElimination] Use range-based loops. NFC.Chad Rosier2015-12-101-9/+6
| | | | llvm-svn: 255265
* [ProfileData] Add unit test infrastructure for sample profile reader/writerNathan Slingerland2015-12-102-12/+54
| | | | | | | | | | | | | | | Summary: Adds support for in-memory round-trip of sample profile data along with basic round trip unit tests. This will also make it easier to include unit tests for future changes to sample profiling. Reviewers: davidxl, dnovillo, silvas Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D15211 llvm-svn: 255264
* Fix fptosi, fptoui from f16 vectors to i8, i16 vectorsPirama Arumuga Nainar2015-12-101-0/+10
| | | | | | | | | | | | | | | | Summary: Convert f16 vectors to corresponding f32 vectors before doing the conversion to int. Add tests for v4f16, v8f16. Reviewers: ab, jmolloy Subscribers: llvm-commits, srhines Differential Revision: http://reviews.llvm.org/D14936 llvm-svn: 255263
* [InstCombine] fold bitcasts around an extractelement (3rd try)Sanjay Patel2015-12-101-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a redo of r255137 (reverted at r255227) which was a redo of r255124 (reverted at r255126) with a fixed check for a scalar source type and an added test for the failure that caused the revert. Original commit message: Example: bitcast (extractelement (bitcast <2 x float> %X to <2 x i32>), 1) to float ---> extractelement <2 x float> %X, i32 1 This is part of fixing PR25543: https://llvm.org/bugs/show_bug.cgi?id=25543 The next step will be to generalize this fold: trunc ( lshr ( bitcast X) ) -> extractelement (X) Ie, I'm hoping to replace the existing transform of: bitcast ( trunc ( lshr ( bitcast X))) added by: http://reviews.llvm.org/rL112232 with 2 less specific transforms to catch the case in the bug report. Differential Revision: http://reviews.llvm.org/D14879 llvm-svn: 255261
* [ThinLTO] Debug message cleanup (NFC)Teresa Johnson2015-12-101-8/+8
| | | | | | | | Added some missing spaces between the module identifier and the start of the debug message. Also added a ":" after the module identifier to make this look a little nicer. llvm-svn: 255259
* Avoid undefined behavior when vector is empty.Rafael Espindola2015-12-101-2/+1
| | | | | | Found by ubsan. llvm-svn: 255258
* remove duplicated comments and don't repeat function names in comments; NFCSanjay Patel2015-12-101-142/+83
| | | | llvm-svn: 255257
* Slit lib/Linker in two.Rafael Espindola2015-12-104-1273/+1510
| | | | | | | | | | | | | | | | A linker normally has two stages: symbol resolution and "moving stuff". In lib/Linker there is the complication of lazy linking some globals, but it was still far more mixed than it needed to. This splits the linker into a lower level IRMover and the linker proper. The IRMover just takes a list of globals to move and a callback that lets the user control what is lazy linked. The main motivation is that now tools/gold (and soon lld) can use their own symbol resolution to instruct IRMover what to do. llvm-svn: 255254
* [WebAssembly] Make WebAssemblyStoreResults only return true when it has a ↵Dan Gohman2015-12-101-1/+3
| | | | | | change. llvm-svn: 255253
* [WebAssembly] Fix WebAssemblyPeephole to set Changed to true when making ↵Dan Gohman2015-12-101-0/+1
| | | | | | changes. llvm-svn: 255252
OpenPOWER on IntegriCloud