summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* MemoryBuffer: Don't use mmap when FileSize is multiple of 4k on Cygwin.NAKAMURA Takumi2014-08-041-0/+8
| | | | | | | | | | | On Cygwin, getpagesize() returns 64k(AllocationGranularity). In r214580, the size of X86GenInstrInfo.inc became 1499136. FIXME: We should reorganize again getPageSize() on Win32. MapFile allocates address along AllocationGranularity but view is mapped by physical page. llvm-svn: 214681
* [x86] Handle single input shuffles in the SSSE3 case more intelligently.Chandler Carruth2014-08-041-0/+4
| | | | | | | | | | I spent some time looking into a better or more principled way to handle this. For example, by detecting arbitrary "unneeded" ORs... But really, there wasn't any point. We just shouldn't build blatantly wrong code so late in the pipeline rather than adding more stages and logic later on to fix it. Avoiding this is just too simple. llvm-svn: 214680
* [LLVM-C] Add LLVM{IsConstantString,GetAsString,GetElementAsConstant}.Peter Zotov2014-08-031-0/+16
| | | | llvm-svn: 214676
* [x86] Don't add nodes to the combined set (and prune subsequentChandler Carruth2014-08-031-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | combines) until they are legal. Doing it the old way could, when the stars align *just* right, cause a node to get into the combine set prior to being legalized. Then, when the same node showed up as an operand to another node later on (but not so much later on that it had been deleted as dead) we would fail to add it back to the worklist thinking it had already been combined. This would in turn cause it to not be legalized. Fortunately, we can also walk the operands looking for uncombined (and thus potentially un-legalized) nodes late. It will still ensure that we walk all operands of all nodes and send all of them through both the legalizer without changes and the combiner at least once. (Which was the original goal of this). I have a test case for this bug, but it is terribly brittle. For example, it will stop finding the bug the moment I enable the new shuffle lowering. I don't yet have any test case that reliably exercises this bug, and it isn't clear that it will be possible to craft one. It is entirely possible that with the new shuffle lowering the two forms of doing this are precisely equivalent. That doesn't mean we shouldn't take the more conservative approach of insisting on things in the combined set having survived the legalizer. llvm-svn: 214673
* X86: silence warning (-Wparentheses)Saleem Abdulrasool2014-08-031-1/+1
| | | | | | | | | GCC 4.8.2 points out the ambiguity in evaluation of the assertion condition: lib/Target/X86/X86FloatingPoint.cpp:949:49: warning: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses] assert(STReturns == 0 || isMask_32(STReturns) && N <= 2); llvm-svn: 214672
* CodeGen: silence a warningSaleem Abdulrasool2014-08-031-2/+1
| | | | | | | | GCC 4.8.2 objects to the tautological condition in the assert as the unsigned value is guaranteed to be >= 0. Simplify the assertion by dropping the tautological condition. llvm-svn: 214671
* fix for PR20354 - Miscompile of fabs due to vectorizationSanjay Patel2014-08-031-1/+5
| | | | | | | | | | This is intended to be the minimal change needed to fix PR20354 ( http://llvm.org/bugs/show_bug.cgi?id=20354 ). The check for a vector operation was wrong; we need to check that the fabs itself is not a vector operation. This patch will not generate the optimal code. A constant pool load and 'and' op will be generated instead of just returning a value that we can calculate in advance (as we do for the scalar case). I've put a 'TODO' comment for that here and expect to have that patch ready soon. There is a very similar optimization that we can do in visitFNEG, so I've put another 'TODO' there and expect to have another patch for that too. llvm-svn: 214670
* MachineCombiner Pass for selecting faster instructionGerolf Hoflehner2014-08-035-13/+536
| | | | | | | | | | | | | | | | | | | | | | sequence - AArch64 target support This patch turns off madd/msub generation in the DAGCombiner and generates them in the MachineCombiner instead. It replaces the original code sequence with the combined sequence when it is beneficial to do so. When there is no machine model support it always generates the madd/msub instruction. This is true also when the objective is to optimize for code size: when the combined sequence is shorter is always chosen and does not get evaluated. When there is a machine model the combined instruction sequence is evaluated for critical path and resource length using machine trace metrics and the original code sequence is replaced when it is determined to be faster. rdar://16319955 llvm-svn: 214669
* MachineCombiner Pass for selecting faster instructionGerolf Hoflehner2014-08-035-14/+503
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sequence - target independent framework When the DAGcombiner selects instruction sequences it could increase the critical path or resource len. For example, on arm64 there are multiply-accumulate instructions (madd, msub). If e.g. the equivalent multiply-add sequence is not on the crictial path it makes sense to select it instead of the combined, single accumulate instruction (madd/msub). The reason is that the conversion from add+mul to the madd could lengthen the critical path by the latency of the multiply. But the DAGCombiner would always combine and select the madd/msub instruction. This patch uses machine trace metrics to estimate critical path length and resource length of an original instruction sequence vs a combined instruction sequence and picks the faster code based on its estimates. This patch only commits the target independent framework that evaluates and selects code sequences. The machine instruction combiner is turned off for all targets and expected to evolve over time by gradually handling DAGCombiner pattern in the target specific code. This framework lays the groundwork for fixing rdar://16319955 llvm-svn: 214666
* MC: virtualise EmitWindowsUnwindTablesSaleem Abdulrasool2014-08-032-4/+7
| | | | | | | | | This makes EmitWindowsUnwindTables a virtual function and lowers the implementation of the function to the X86WinCOFFStreamer. This method is a target specific operation. This enables making the behaviour target dependent by isolating it entirely to the target specific streamer. llvm-svn: 214664
* MC: rename Win64EHFrameInfo to WinEH::FrameInfoSaleem Abdulrasool2014-08-033-26/+38
| | | | | | | | | | | | | | The frame information stored in this structure is driven by the requirements for Windows NT unwinding rather than Windows 64 specifically. As a result, this type can be shared across multiple architectures (ARM, AXP, MIPS, PPC, SH). Rename this class in preparation for adding support for supporting unwinding information for Windows on ARM. Take the opportunity to constify the members as everything except the ChainedParent is read-only. This required some adjustment to the label handling. llvm-svn: 214663
* R600/SI: Fix extra whitespace in asm strMatt Arsenault2014-08-031-1/+1
| | | | | | | | | This slipped in in r214467, so something like V_MOV_B32_e32 v0, ... is now printed with 2 spaces between the instruction name and first operand. llvm-svn: 214660
* [SimplifyCFG] fix accessing deleted PHINodes in switch-to-table conversion.Manman Ren2014-08-021-1/+4
| | | | | | | | | When we have a covered lookup table, make sure we don't delete PHINodes that are cached in PHIs. rdar://17887153 llvm-svn: 214642
* tlbia supportJoerg Sonnenberger2014-08-022-0/+4
| | | | llvm-svn: 214640
* mfdcr / mtdcr supportJoerg Sonnenberger2014-08-021-0/+5
| | | | llvm-svn: 214639
* fix bug 20513 - Crash in SLP VectorizerErik Eckstein2014-08-021-10/+14
| | | | llvm-svn: 214638
* Don't use additional arguments for dss and friends to satisfy DSS_Form,Joerg Sonnenberger2014-08-022-63/+54
| | | | | | | | | when let can do the same thing. Keep the 64bit variants as codegen-only. While they have a different register class, the encoding is the same for 32bit and 64bit mode. Having both present would otherwise confuse the disassembler. llvm-svn: 214636
* [AArch64] Teach DAGCombiner that converting two consecutive loads into a ↵James Molloy2014-08-021-0/+7
| | | | | | | | vector load is not a good transform when paired loads are available. The combiner was creating Q-register loads and stores, which then had to be spilled because there are no callee-save Q registers! llvm-svn: 214634
* [x86] Remove the FIXME that was implemented in r214628. Managed toChandler Carruth2014-08-021-4/+0
| | | | | | forget to update the comment here... =/ llvm-svn: 214630
* [x86] Largely complete the use of PSHUFB in the new vector shuffleChandler Carruth2014-08-023-14/+138
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lowering with a small addition to it and adding PSHUFB combining. There is one obvious place in the new vector shuffle lowering where we should form PSHUFBs directly: when without them we will unpack a vector of i8s across two different registers and do a potentially 4-way blend as i16s only to re-pack them into i8s afterward. This is the crazy expensive fallback path for i8 shuffles and we can just directly use pshufb here as it will always be cheaper (the unpack and pack are two instructions so even a single shuffle between them hits our three instruction limit for forming PSHUFB). However, this doesn't generate very good code in many cases, and it leaves a bunch of common patterns not using PSHUFB. So this patch also adds support for extracting a shuffle mask from PSHUFB in the X86 lowering code, and uses it to handle PSHUFBs in the recursive shuffle combining. This allows us to combine through them, combine multiple ones together, and generally produce sufficiently high quality code. Extracting the PSHUFB mask is annoyingly complex because it could be either pre-legalization or post-legalization. At least this doesn't have to deal with re-materialized constants. =] I've added decode routines to handle the different patterns that show up at this level and we dispatch through them as appropriate. The two primary test cases are updated. For the v16 test case there is still a lot of room for improvement. Since I was going through it systematically I left behind a bunch of FIXME lines that I'm hoping to turn into ALL lines by the end of this. llvm-svn: 214628
* [x86] Switch to using the variable we extracted this operand into.Chandler Carruth2014-08-021-1/+1
| | | | | | | Spotted this missed refactoring by inspection when reading code, and it doesn't changethe functionality at all. llvm-svn: 214627
* [x86] Fix a few typos in my comments spotted in passing.Chandler Carruth2014-08-021-3/+3
| | | | llvm-svn: 214626
* [x86] Teach the target shuffle mask extraction to recognize unary formsChandler Carruth2014-08-021-6/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of normally binary shuffle instructions like PUNPCKL and MOVLHPS. This detects cases where a single register is used for both operands making the shuffle behave in a unary way. We detect this and adjust the mask to use the unary form which allows the existing DAG combine for shuffle instructions to actually work at all. As a consequence, this uncovered a number of obvious bugs in the existing DAG combine which are fixed. It also now canonicalizes several shuffles even with the existing lowering. These typically are trying to match the shuffle to the domain of the input where before we only really modeled them with the floating point variants. All of the cases which change to an integer shuffle here have something in the integer domain, so there are no more or fewer domain crosses here AFAICT. Technically, it might be better to go from a GPR directly to the floating point domain, but detecting floating point *outputs* despite integer inputs is a lot more code and seems unlikely to be worthwhile in practice. If folks are seeing domain-crossing regressions here though, let me know and I can hack something up to fix it. Also as a consequence, a bunch of missed opportunities to form pshufb now can be formed. Notably, splats of i8s now form pshufb. Interestingly, this improves the existing splat lowering too. We go from 3 instructions to 1. Yes, we may tie up a register, but it seems very likely to be worth it, especially if splatting the 0th byte (the common case) as then we can use a zeroed register as the mask. llvm-svn: 214625
* [x86] Teach my pshufb comment printer to handle VPSHUFB forms as well asChandler Carruth2014-08-021-2/+3
| | | | | | | PSHUFB forms. This will be important to update some AVX tests when I add PSHUFB combining. llvm-svn: 214624
* [SDAG] Refactor the code which deletes nodes in the DAG combiner to doChandler Carruth2014-08-021-54/+36
| | | | | | | | | | | | | so using a single helper which adds operands back onto the worklist. Several places didn't rigorously do this but a couple already did. Factoring them together and doing it rigorously is important to delete things recursively early on in the combiner and get a chance to see accurate hasOneUse values. While no existing test cases change, an upcoming patch to add DAG combining logic for PSHUFB requires this to work correctly. llvm-svn: 214623
* Fix issues with ISD::FNEG and ISD::FMA SDNodes where they would not be ↵Owen Anderson2014-08-021-0/+12
| | | | | | | | | | | | constant-folded during DAGCombine in certain circumstances. Unfortunately, the circumstances required to trigger the issue seem to require a pretty specific interaction of DAGCombines, and I haven't been able to find a testcase that reproduces on X86, ARM, or AArch64. The functionality added here is replicated in essentially every other DAG combine, so it seems pretty obviously correct. llvm-svn: 214622
* CodeGen: Remove commented out codeJustin Bogner2014-08-021-2/+0
| | | | | | | These two lines have been commented out for over 4 years. They aren't helping anyone. llvm-svn: 214615
* [ARM] In dynamic-no-pic mode, ARM's post-RA pseudo expansion was incorrectlyAkira Hatanaka2014-08-023-9/+9
| | | | | | | | | expanding pseudo LOAD_STATCK_GUARD using instructions that are normally used in pic mode. This patch fixes the bug. <rdar://problem/17886592> llvm-svn: 214614
* [MCJIT] Fix an overly-aggressive check in RuntimeDyldMachOARM.Lang Hames2014-08-021-5/+0
| | | | | | | This should fix the MachO_ARM_PIC_relocations.s test failures on some 32-bit testers. llvm-svn: 214613
* R600/SI: Fix formatting.Matt Arsenault2014-08-021-22/+28
| | | | | | Avoid weird line wrapping of BuildMI dest register. llvm-svn: 214608
* [ASan] Use metadata to pass source-level information from Clang to ASan.Alexey Samsonov2014-08-021-43/+56
| | | | | | | | | | | | | | | | | Instead of creating global variables for source locations and global names, just create metadata nodes and strings. They will be transformed into actual globals in the instrumentation pass (if necessary). This approach is more flexible: 1) we don't have to ensure that our custom globals survive all the optimizations 2) if globals are discarded for some reason, we will simply ignore metadata for them and won't have to erase corresponding globals 3) metadata for source locations can be reused for other purposes: e.g. we may attach source location metadata to alloca instructions and provide better descriptions for stack variables in ASan error reports. No functionality change. llvm-svn: 214604
* [SDAG] Allow the legalizer to delete an illegally typed intermediateChandler Carruth2014-08-021-3/+4
| | | | | | | | | | | | introduced during legalization. This pattern is based on other patterns in the legalizer that I changed in the same way. Now, the legalizer eagerly collects its garbage when necessary so that we can survive leaving such nodes around for it. Instead, we add an assert to make sure the node will be correctly handled by that layer. llvm-svn: 214602
* [SDAG] Let the DAG combiner take care of dead nodes rather than manuallyChandler Carruth2014-08-021-2/+0
| | | | | | | deleting them. This already seems to work, as no tests fail without this. llvm-svn: 214601
* Add diagnostics to the vectorizer cost model.Tyler Nowicki2014-08-021-16/+30
| | | | | | | | | | When the cost model determines vectorization is not possible/profitable these remarks print an analysis of that decision. Note that in selectVectorizationFactor() we can assume that OptForSize and ForceVectorization are mutually exclusive. Reviewed by Arnold Schwaighofer llvm-svn: 214599
* IR: Add Value::reverseUseList()Duncan P. N. Exon Smith2014-08-011-0/+19
| | | | | | I'm going to use this to improve `verify-uselistorder`. Part of PR5680. llvm-svn: 214594
* PartiallyInlineLibCalls: Check sqrt result type before transforming it.Peter Collingbourne2014-08-011-0/+4
| | | | | | | Some configure scripts declare this with the wrong prototype, which can lead to an assertion failure. llvm-svn: 214593
* verify-uselistorder: Move shuffleUseLists() out of lib/IRDuncan P. N. Exon Smith2014-08-011-104/+2
| | | | | | | | | | `shuffleUseLists()` is only used in `verify-uselistorder`, so move it there to avoid bloating other executables. As a drive-by, update some of the header docs. This is part of PR5680. llvm-svn: 214592
* Attempt to increase the overall happiness of the MSCV-based buildbots.Adrian Prantl2014-08-011-1/+1
| | | | llvm-svn: 214588
* InstrProf: Allow multiple functions with the same nameJustin Bogner2014-08-013-40/+82
| | | | | | | | | | | | | | | | | | | | | | | This updates the instrumentation based profiling format so that when we have multiple functions with the same name (but different function hashes) we keep all of them instead of rejecting the later ones. There are a number of scenarios where this can come up where it's more useful to keep multiple function profiles: * Name collisions in unrelated libraries that are profiled together. * Multiple "main" functions from multiple tools built against a common library. * Combining profiles from different build configurations (ie, asserts and no-asserts) The profile format now stores the number of counters between the hash and the counts themselves, so that multiple sets of counts can be stored. Since this is backwards incompatible, I've bumped the format version and added some trivial logic to skip this when reading the old format. llvm-svn: 214585
* UseListOrder: Guarantee that shuffles change use-list orderDuncan P. N. Exon Smith2014-08-011-9/+12
| | | | | | | | | Change shuffleUseLists() always to change use-list order by rejecting orders that have no changes. This is part of PR5680. llvm-svn: 214584
* UseListOrder: Fix blockaddress use-list orderDuncan P. N. Exon Smith2014-08-011-6/+20
| | | | | | | | | | | | | | | | | | | `parseBitcodeFile()` uses the generic `getLazyBitcodeFile()` function as a helper. Since `parseBitcodeFile()` isn't actually lazy -- it calls `MaterializeAllPermanently()` -- bypass the unnecessary call to `materializeForwardReferencedFunctions()` by extracting out a common helper function. This removes the last of the use-list churn caused by blockaddresses. This highlights that we can't reproduce use-list order of globals and constants when parsing lazily -- but that's necessarily out of scope. When we're parsing lazily, we never have all the functions in memory, so the use-lists of globals (and constants that reference globals) are always incomplete. This is part of PR5680. llvm-svn: 214581
* [X86] Simplify X87 stackifier pass.Akira Hatanaka2014-08-017-308/+196
| | | | | | | | | | | | | | | | | | | Stop using ST registers for function returns and inline-asm instructions and use FP registers instead. This allows removing a large amount of code in the stackifier pass that was needed to track register liveness and handle copies between ST and FP registers and function calls returning floating point values. It also fixes a bug which manifests when an ST register defined by an inline-asm instruction was live across another inline-asm instruction, as shown in the following sequence of machine instructions: 1. INLINEASM <es:frndint> $0:[regdef], %ST0<imp-def,tied5> 2. INLINEASM <es:fldcw $0> 3. %FP0<def> = COPY %ST0 <rdar://problem/16952634> llvm-svn: 214580
* Debug info: Infrastructure to support debug locations for fragmentedAdrian Prantl2014-08-0111-73/+371
| | | | | | | | | | | | | | | | | | | | | | | | variables (for example, by-value struct arguments passed in registers, or large integer values split across several smaller registers). On the IR level, this adds a new type of complex address operation OpPiece to DIVariable that describes size and offset of a variable fragment. On the DWARF emitter level, all pieces describing the same variable are collected, sorted and emitted as DWARF expressions using the DW_OP_piece and DW_OP_bit_piece operators. http://reviews.llvm.org/D3373 rdar://problem/15928306 What this patch doesn't do / Future work: - This patch only adds the backend machinery to make this work, patches that change SROA and SelectionDAG's type legalizer to actually create such debug info will follow. (http://reviews.llvm.org/D2680) - Making the DIVariable complex expressions into an argument of dbg.value will reduce the memory footprint of the debug metadata. - The sorting/uniquing of pieces should be moved into DebugLocEntry, to facilitate the merging of multi-piece entries. llvm-svn: 214576
* [SDAG] MorphNodeTo recursively deletes dead operands of the oldChandler Carruth2014-08-013-9/+13
| | | | | | | | | | | | | fromulation of the node, which isn't really the desired behavior from within the combiner or legalizer, but is necessary within ISel. I've added a hopefully helpful comment and fixed the only two places where this took place. Yet another step toward the combiner and legalizer not needing to use update listeners with virtual calls to manage the worklists behind legalization and combining. llvm-svn: 214574
* Revert "R600: Move code for generating REGISTER_LOAD into R600ISelLowering.cpp"Tom Stellard2014-08-012-42/+37
| | | | | | | | This reverts commit r214566. I did not mean to commit this yet. llvm-svn: 214572
* BitcodeReader: Change mechanics of BlockAddress forward references, NFCDuncan P. N. Exon Smith2014-08-012-38/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we can reliably handle forward references to `BlockAddress` (r214563), change the mechanics to simplify predicting use-list order. Previously, we created dummy `GlobalVariable`s to represent block addresses. After every function was materialized, we'd go through any forward references to its blocks and RAUW them with a proper `BlockAddress` constant. This causes some (potentially a lot of) unnecessary use-list churn, since any constant expression that it's a part of will need to be rematerialized as well. Instead, pre-construct a `BasicBlock` immediately -- without attaching it to its (empty) `Function` -- and use that to construct a `BlockAddress`. This constant will not have to be regenerated. When the function body is parsed, hook this pre-constructed basic block up in the right place using `BasicBlock::insertInto()`. Both before and after this change, the IR is temporarily in an invalid state that gets resolved when `materializeForwardReferencedFunctions()` gets called. This is a prep commit that's part of PR5680, but the only functionality change is the reduction of churn in the constant pool. llvm-svn: 214570
* R600/SI: Remove leftover debugging codeTom Stellard2014-08-011-2/+0
| | | | llvm-svn: 214569
* R600: Move code for generating REGISTER_LOAD into R600ISelLowering.cppTom Stellard2014-08-012-37/+42
| | | | | | | SI doesn't use REGISTER_LOAD anymore, but it was still hitting this code path for 8-bit and 16-bit private loads. llvm-svn: 214566
* IR: Add BasicBlock::insertInto()Duncan P. N. Exon Smith2014-08-011-6/+13
| | | | | | | | | | | | | | | | | | | Although unlinked `BasicBlock`s can be created, there's currently no way to insert them into `Function`s after the fact. In particular, `moveAfter()` and `moveBefore()` require that the basic block is already linked. Extract the logic for initially linking a `BasicBlock` out of the constructor and into a member function that can be used for lazy insertion. - Asserts that the basic block is currently unlinked. - Matches the logic of the constructor. - Changed the constructor to use it since the logic matches. This is needed in a follow-up commit for PR5680. llvm-svn: 214563
* [dfsan] Correctly handle loads and stores of zero size.Peter Collingbourne2014-08-011-0/+8
| | | | llvm-svn: 214561
OpenPOWER on IntegriCloud