summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* [Power9] Implement new altivec instructions: bcd* seriesChuang-Yu Cheng2016-03-283-0/+126
| | | | | | | | | | | | | | | | | | | | | | | | | This patch implements the following altivec instructions: - Decimal Convert From/to National/Zoned/Signed-QWord: bcdcfn. bcdcfz. bcdctn. bcdctz. bcdcfsq. bcdctsq. - Decimal Copy-Sign/Set-Sign: bcdcpsgn. bcdsetsgn. - Decimal Shift/Unsigned-Shift/Shift-and-Round: bcds. bcdus. bcdsr. - Decimal (Unsigned) Truncate: bcdtrunc. bcdutrunc. Total 13 instructions Thanks Amehsan's advice! Thanks Kit's great help! Reviewers: hal, nemanja, kbarton, tjablin, amehsan http://reviews.llvm.org/D17838 llvm-svn: 264568
* [Power9] Implement new vsx instructions: insert, extract, test data class, ↵Chuang-Yu Cheng2016-03-287-0/+345
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | min/max, reverse, permute, splat This change implements the following vsx instructions: - Scalar Insert/Extract xsiexpdp xsiexpqp xsxexpdp xsxsigdp xsxexpqp xsxsigqp - Vector Insert/Extract xviexpdp xviexpsp xvxexpdp xvxexpsp xvxsigdp xvxsigsp xxextractuw xxinsertw - Scalar/Vector Test Data Class xststdcdp xststdcsp xststdcqp xvtstdcdp xvtstdcsp - Maximum/Minimum xsmaxcdp xsmaxjdp xsmincdp xsminjdp - Vector Byte-Reverse/Permute/Splat xxbrd xxbrh xxbrq xxbrw xxperm xxpermr xxspltib 30 instructions Thanks Nemanja for invaluable discussion! Thanks Kit's great help! Reviewers: hal, nemanja, kbarton, tjablin, amehsan http://reviews.llvm.org/D16842 llvm-svn: 264567
* AVX-512: Fixed ICMP instruction selection for i1 operandsElena Demikhovsky2016-03-281-5/+9
| | | | | | | | | | ICMP instruction selection fails on SKX and KNL for i1 operand. I use XOR to resolve: (A == B) is equivalent to (A xor B) == 0 Differential Revision: http://reviews.llvm.org/D18511 llvm-svn: 264566
* [Power9] Implement new vsx instructions: quad-precision move, fp-arithmeticChuang-Yu Cheng2016-03-282-0/+171
| | | | | | | | | | | | | | | | | | | | This change implements the following vsx instructions: - quad-precision move xscpsgnqp, xsabsqp, xsnegqp, xsnabsqp - quad-precision fp-arithmetic xsaddqp(o) xsdivqp(o) xsmulqp(o) xssqrtqp(o) xssubqp(o) xsmaddqp(o) xsmsubqp(o) xsnmaddqp(o) xsnmsubqp(o) 22 instructions Thanks Nemanja and Kit for careful review and invaluable discussion! Reviewers: hal, nemanja, kbarton, tjablin, amehsan http://reviews.llvm.org/D16110 llvm-svn: 264565
* [Coverage] Fix the way we load "<unknown>:func" recordsVedant Kumar2016-03-281-1/+1
| | | | | | | | | When emitting coverage mappings for functions with local linkage and an unknown filename, we use "<unknown>:func" for the PGO function name. The problem is that we don't strip "<unknown>" from the name when loading coverage data, like we do for other file names. Fix that and add a test. llvm-svn: 264559
* BitcodeWriter: Replace dead code with an assertion, NFCDuncan P. N. Exon Smith2016-03-281-7/+1
| | | | | | | The caller of ValueEnumerator::EnumerateOperandType never sends in metadata. Assert that, and remove the unnecessary logic. llvm-svn: 264558
* BitcodeWriter: Reuse writeMetadataRecords, NFCDuncan P. N. Exon Smith2016-03-271-5/+2
| | | | | | | | Change writeFunctionMetadata to call writeMetadataRecords. For now there's no functionality change, but makes it easy to serialize other types of metadata in the function block in the future. llvm-svn: 264557
* BitcodeWriter: Rename some functions for consistency, NFCDuncan P. N. Exon Smith2016-03-271-35/+34
| | | | | | | | | | | | | | | | To match writeMetadataRecords, writeNamedMetadata and writeMetadataStrings, change: WriteModuleMetadata => writeModuleMetadata WriteFunctionLocalMetadata => writeFunctionMetadata Write##CLASS => write##CLASS The only major change is "FunctionLocal" => "Function". The point is to be less specific, in preparation for emitting normal metadata records inside function metadata blocks (currently we only emit `LocalAsMetadata` there). llvm-svn: 264556
* BitcodeWriter: Split out writeMetadataRecords, NFCDuncan P. N. Exon Smith2016-03-271-9/+17
| | | | | | | Besides being a nice cleanup, this is preparation for reusing the code in function metadata blocks. llvm-svn: 264555
* BitcodeWriter: Restructure WriteFunctionLocalMetadata, NFCDuncan P. N. Exon Smith2016-03-271-11/+9
| | | | | | Use an early return to simplify logic. llvm-svn: 264554
* BitcodeWriter: Simplify tracking of function-local metadata, NFCDuncan P. N. Exon Smith2016-03-273-12/+5
| | | | | | | | We don't really need a separate vector here; instead, point at a range inside the main MDs array. This matches how r264551 references the ranges of strings and non-strings. llvm-svn: 264552
* Reapply ~"Bitcode: Collect all MDString records into a single blob"Duncan P. N. Exon Smith2016-03-274-33/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Spiritually reapply commit r264409 (reverted in r264410), albeit with a bit of a redesign. Firstly, avoid splitting the big blob into multiple chunks of strings. r264409 imposed an arbitrary limit to avoid a massive allocation on the shared 'Record' SmallVector. The bug with that commit only reproduced when there were more than "chunk-size" strings. A test for this would have been useless long-term, since we're liable to adjust the chunk-size in the future. Thus, eliminate the motivation for chunk-ing by storing the string sizes in the blob. Here's the layout: vbr6: # of strings vbr6: offset-to-blob blob: [vbr6]: string lengths [char]: concatenated strings Secondly, make the output of llvm-bcanalyzer readable. I noticed when debugging r264409 that llvm-bcanalyzer was outputting a massive blob all in one line. Past a small number, the strings were impossible to split in my head, and the lines were way too long. This version adds support in llvm-bcanalyzer for pretty-printing. <STRINGS abbrevid=4 op0=3 op1=9/> num-strings = 3 { 'abc' 'def' 'ghi' } From the original commit: Inspired by Mehdi's similar patch, http://reviews.llvm.org/D18342, this should (a) slightly reduce bitcode size, since there is less record overhead, and (b) greatly improve reading speed, since blobs are super cheap to deserialize. llvm-svn: 264551
* Support: Implement StreamingMemoryObject::getPointerDuncan P. N. Exon Smith2016-03-272-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | The implementation is fairly obvious. This is preparation for using some blobs in bitcode. For clarity (and perhaps future-proofing?), I moved the call to JumpToBit in BitstreamCursor::readRecord ahead of calling MemoryObject::getPointer, since JumpToBit can theoretically (a) read bytes, which (b) invalidates the blob pointer. This isn't strictly necessary the two memory objects we have: - The return of RawMemoryObject::getPointer is valid until the memory object is destroyed. - StreamingMemoryObject::getPointer is valid until the next chunk is read from the stream. Since the JumpToBit call is only going ahead to a word boundary, we'll never load another chunk. However, reordering makes it clear by inspection that the blob returned by BitstreamCursor::readRecord will be valid. I added some tests for StreamingMemoryObject::getPointer and BitstreamCursor::readRecord. llvm-svn: 264549
* Bitcode: Add SimpleBitstreamCursor::getPointerToByte, etc.Duncan P. N. Exon Smith2016-03-271-3/+1
| | | | | | | | | | | | | | | Add API to SimpleBitstreamCursor to allow users to translate between byte addresses and pointers. - jumpToPointer: move the bit position to a particular pointer. - getPointerToByte: get the pointer for a particular byte. - getPointerToBit: get the pointer for the byte of the current bit. - getCurrentByteNo: convenience function for assertions and tests. Mainly adds unit tests (getPointerToBit/Byte already has a use), but also preparation for eventually using jumpToPointer. llvm-svn: 264546
* Bitcode: Split out SimpleBitstreamCursorDuncan P. N. Exon Smith2016-03-271-9/+13
| | | | | | | | | | | Split out SimpleBitstreamCursor from BitstreamCursor, which is a lower-level cursor with no knowledge of bitcode blocks, abbreviations, or records. It just knows how to read bits and navigate the stream. This is mainly organizational, to separate the API for manipulating raw bits from that for bitcode concepts like Record and Block. llvm-svn: 264545
* [ThinLTO] Add optional import message and statisticsTeresa Johnson2016-03-271-0/+14
| | | | | | | | | | | | | | | | | | | | | | Summary: Add a statistic to count the number of imported functions. Also, add a new -print-imports option to emit a trace of imported functions, that works even for an NDEBUG build. Note that emitOptimizationRemark does not work for the above printing as it expects a Function object and DebugLoc, neither of which we have with summary-based importing. This is part 2 of D18487, the first part was committed separately as r264536. Reviewers: joker.eph Subscribers: llvm-commits, joker.eph Differential Revision: http://reviews.llvm.org/D18487 llvm-svn: 264537
* [ThinLTO] Don't try to import alias unless aliasee can be importedTeresa Johnson2016-03-271-4/+4
| | | | | | | | | | | | | | | | | | With r264503, aliases are now being added to the GlobalsToImport set even when their aliasees can't be imported due to their linkage type. While the importing worked correctly (the aliases imported as declarations) due to the logic in doImportAsDefinition, there is no point to adding them to the GlobalsToImport set. Additionally, with D18487 it was resulting in incorrectly printing a message indicating that the alias was imported. To avoid this, delay adding aliases to the GlobalsToImport set until after the linkage type of the aliasee is checked. This patch is part of D18487. llvm-svn: 264536
* [PowerPC] Map max/minnum intrinsics and fmax/fmin to ISD nodes for CTR-based ↵Hal Finkel2016-03-271-3/+13
| | | | | | | | | | | | | | | | | loop legality Intrinsic::maxnum and Intrinsic::minnum, along with the associated libc function calls (fmax[f], etc.) generally map to function calls after lowering. For some vector types with QPX at least, however, we can legally lower these, and we don't need to prohibit CTR-based loops on their account. It turned out, however, that the logic that checked the opcodes associated with intrinsics was broken (it would set the Opcode variable, but that variable was later checked only if set for some otherwise-external function call. This fixes the latter problem and adds the FMAX/MINNUM mappings. llvm-svn: 264532
* [Verifier] Reject PHIs using defs from own block.Michael Kruse2016-03-261-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reject the following IR as malformed (assuming that %entry, %next are not in a loop): next: %y = phi i32 [ 0, %entry ] %x = phi i32 [ %y, %entry ] Such PHI nodes came up in PR26718. While there was no consensus on whether or not this is valid IR, most opinions on that bug and in a discussion on the llvm-dev mailing list tended towards a "strict interpretation" (term by Joseph Tremoulet) of PHI node uses. Also, the language reference explicitly states that "the use of each incoming value is deemed to occur on the edge from the corresponding predecessor block to the current block" and `DominatorTree::dominates(Instruction*, Use&)` uses this definition as well. For the code mentioned in PR15384, clang does not compile to such PHIs (anymore?). The test case still hangs when replacing `%tmp6` with `%tmp` in revisions before r176366 (where PR15384 has been fixed). The occurrence of %tmp6 therefore was probably unintentional. Its value is not used except in other PHIs. Reviewers: majnemer, reames, JosephTremoulet, bkramer, grosser, jdoerfert, kparzysz, sanjoy Differential Revision: http://reviews.llvm.org/D18443 llvm-svn: 264528
* [SimplifyCFG] propagate branch metadata when creating select (PR26636)Sanjay Patel2016-03-261-2/+2
| | | | llvm-svn: 264527
* [X86][AVX] Enabled SMUL_LOHI/UMUL_LOHI v8i32 vectors on AVX1 targetsSimon Pilgrim2016-03-261-0/+21
| | | | | | Correct splitting of v8i32 vectors into v4i32 vectors to prevent scalarization llvm-svn: 264517
* Revert "NFC: static_assert instead of comment"JF Bastien2016-03-261-3/+1
| | | | | | | | | | | This reverts commit fa36fcff16c7d4f78204d6296bf96c3558a4a672. Causes the following Windows failure: C:\Buildbot\Slave\llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast\llvm.src\lib\CodeGen\MachineInstr.cpp(762): error C2338: must be trivially copyable to memmove llvm-svn: 264516
* NFC: static_assert instead of commentJF Bastien2016-03-261-1/+3
| | | | | | | | | | Summary: isPodLike is as close as we have for is_trivially_copyable. Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D18483 llvm-svn: 264515
* [X86][AVX] Enabled MULHS/MULHU v16i16 vectors on AVX1 targetsSimon Pilgrim2016-03-261-0/+2
| | | | | | | | Correct splitting of v16i16 vectors into v8i16 vectors to prevent scalarization Differential Revision: http://reviews.llvm.org/D18307 llvm-svn: 264512
* [X86][SSE] Add MULHS/MULHU custom lowering for i8 vectorsSimon Pilgrim2016-03-261-0/+124
| | | | | | | | Currently this is to mainly to prevent scalarization of integer division by constants. Differential Revision: http://reviews.llvm.org/D18307 llvm-svn: 264511
* [X86][AVX512BW] AVX512BW can sign-extend v32i8 to v32i16 for simpler v32i8 ↵Simon Pilgrim2016-03-261-2/+3
| | | | | | | | multiplies. Only pre-AVX512BW targets need to split v32i8 vectors. llvm-svn: 264509
* [PowerPC] Disable the CTR optimization in the presence of {min,max}numDavid Majnemer2016-03-261-0/+2
| | | | | | | | | The minnum and maxnum intrinsics get lowered to libcalls which invalidates the CTR optimization. This fixes PR27083. llvm-svn: 264508
* [X86][SSE] Don't duplicate Lower256IntArith functionality in LowerMul. NFC.Simon Pilgrim2016-03-261-13/+5
| | | | | | LowerMul v32i8 on AVX2 needs to split the 256-bit sources to allow sign-extension back to v16i16 to occur. Since this is basically the same as Lower256IntArith we simplify by using that here instead. llvm-svn: 264506
* Minor code cleanup. NFC.Junmo Park2016-03-261-1/+1
| | | | llvm-svn: 264505
* [Power9] Implement new altivec instructions: permute, count zero, extend ↵Chuang-Yu Cheng2016-03-263-0/+181
| | | | | | | | | | | | | | | | | | | | | sign, negate, parity, shift/rotate, mul10 This change implements the following vector operations: - vclzlsbb vctzlsbb vctzb vctzd vctzh vctzw - vextsb2w vextsh2w vextsb2d vextsh2d vextsw2d - vnegd vnegw - vprtybd vprtybq vprtybw - vbpermd vpermr - vrlwnm vrlwmi vrldnm vrldmi vslv vsrv - vmul10cuq vmul10uq vmul10ecuq vmul10euq 28 instructions Thanks Nemanja, Kit for invaluable hints and discussion! Reviewers: hal, nemanja, kbarton, tjablin, amehsan Phabricator: http://reviews.llvm.org/D15887 llvm-svn: 264504
* ThinLTO: use the callgraph from the combined index to drive the FunctionImporterMehdi Amini2016-03-262-257/+280
| | | | | | | | | | | | | | | | | | Summary: Now that the summary contains the full reference/call graph, we can replace the existing function importer that loads and inspect the IR to iteratively walk the call graph by a traversal based purely on the summary information. Decouple the actual importing decision from any IR manipulation. Reviewers: tejohnson Subscribers: llvm-commits, joker.eph Differential Revision: http://reviews.llvm.org/D18343 From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 264503
* Rename ModuleSummaryIndex::modPathStringEntries() into modulePaths()Mehdi Amini2016-03-261-1/+1
| | | | | | | It now return the map instead of an iterator. From: Mehdi Amini <mehdi.amini@apple.com> llvm-svn: 264489
* [Support] Switch to RAII helper for error-as-out-parameter idiom.Lang Hames2016-03-251-2/+2
| | | | | | As discussed on the llvm-commits thread for r264467. llvm-svn: 264479
* Improve the reliability of file renaming in Windows by having the compiler retrySunil Srivastava2016-03-251-16/+33
| | | | | | | | | | | the rename operation on 3 error conditions of ReplaceFileW() that it was previously bailing out on. Patch by Douglas Yung! Differential Revision: http://reviews.llvm.org/D17903 llvm-svn: 264477
* [Object] Make createMachOObjectFile return Expected<...> rather thanLang Hames2016-03-253-12/+9
| | | | | | ErrorOr<...>. llvm-svn: 264473
* Allow value forwarding past release fences in GVNPhilip Reames2016-03-251-0/+9
| | | | | | | | | | | | | | A release fence acts as a publication barrier for stores within the current thread to become visible to other threads which might observe the release fence. It does not require the current thread to observe stores performed on other threads. As a result, we can allow store-load and load-load forwarding across a release fence. We choose to be much more conservative about stores. In theory, nothing prevents us from shifting a store from after a release fence to before it, and then eliminating the preceeding (previously fenced) store. Doing this without actually moving the second store is likely also legal, but we chose to be conservative at this time. The LangRef indicates only atomic loads and stores are effected by fences. This patch chooses to be far more conservative then that. This is the GVN companion to http://reviews.llvm.org/D11434 which applied the same logic in EarlyCSE and has been baking in tree for a while now. Differential Revision: http://reviews.llvm.org/D11436 llvm-svn: 264472
* [Object] Make MachOObjectFile's constructor private, provide a static createLang Hames2016-03-251-29/+26
| | | | | | | | | | method instead. This is not quite a named constructor: Construction may fail, and MachOObjectFiles are usually passed by unique_ptr anyway, so create returns an Expected<std::unique_ptr<MachOObjectFile>>. llvm-svn: 264469
* [X86] Emit a proper ADJCALLSTACKDOWN in EmitLoweredTLSAddrDavid Majnemer2016-03-251-1/+1
| | | | | | | | | We forgot to add the second machine operand to our ADJCALLSTACKDOWN, resulting in crashes in PEI. This fixes PR27071. llvm-svn: 264465
* [MachineCopyPropagation] Expose more dead copies across instructions with ↵Jun Bum Lim2016-03-251-3/+14
| | | | | | | | | | | regmasks When encountering instructions with regmasks, instead of cleaning up all the elements in MaybeDeadCopies map, remove only the instructions erased. By keeping more instruction in MaybeDeadCopies, this change will expose more dead copies across instructions with regmasks. llvm-svn: 264462
* Prevent construction of cycle in DAG store mergeNirav Dave2016-03-253-40/+44
| | | | | | | | | | | | | | | | | | | | | When merging stores in DAGCombiner, add check to ensure that no dependenices exist that would cause the construction of a cycle in our DAG. This may happen if one store has a data dependence on another instruction (e.g. a load) which itself has a (chain) dependence on another store being merged. These stores cannot be merged safely and doing so results in a cycle that is discovered in LegalizeDAG. This test is only done in cases where Antialias analysis is used (UseAA) as non-AA store merge candidates will be merged logically after all loads which have been checked to not alias. Reviewers: ahatanak, spatel, niravd, arsenm, hfinkel, tstellarAMD, jyknight Subscribers: llvm-commits, tberghammer, danalbert, srhines Differential Revision: http://reviews.llvm.org/D18336 llvm-svn: 264461
* [libFuzzer] use fflush after every PrintfKostya Serebryany2016-03-251-0/+1
| | | | llvm-svn: 264459
* Remove useless and unused CrashRecoveryContext::getBacktrace(). This ↵Richard Smith2016-03-251-8/+0
| | | | | | function always returned an empty string. llvm-svn: 264458
* [RS4GC] Lower calls to @llvm.experimental.deoptimizeSanjoy Das2016-03-252-10/+30
| | | | | | | | | | | | | | This changes RS4GC to lower calls to ``@llvm.experimental.deoptimize`` to gc.statepoints wrapping ``__llvm_deoptimize``, and changes ``callsGCLeafFunction`` to recognize ``@llvm.experimental.deoptimize`` as a non GC leaf function. I've had to hard code the ``"__llvm_deoptimize"`` name in RewriteStatepointsForGC; since ``TargetLibraryInfo`` is available only during codegen. This isn't without precedent in the codebase, so I'm not overtly concerned. llvm-svn: 264456
* CodeGen: Don't iterate over operands after we've erased an MIJustin Bogner2016-03-251-13/+20
| | | | | | | | | | | This fixes a use-after-free introduced 3 years ago, in r182872 ;) The code more or less worked because the memory that CopyMI was pointing to happened to still be valid, but lots of tests would crash if you ran under ASAN with the recycling allocator changes from llvm.org/PR26808 llvm-svn: 264455
* ARM: maintain BB ordering when expanding WIN__DBZCHKSaleem Abdulrasool2016-03-251-1/+1
| | | | | | | | | | | | | It is possible to have a fallthrough MBB prior to MBB placement. The original addition of the BB would result in reordering the BB as not preceding the successor. Because of the fallthrough nature of the BB, we could end up executing incorrect code or even a constant pool island! Insert the spliced BB into the same location to avoid that. Thanks to Tim Northover for invaluable hints and Fiora for the discussion on what may have been occurring! llvm-svn: 264454
* [ThinLTO] Rename edges() to calls() for clarity (NFC)Teresa Johnson2016-03-251-3/+3
| | | | | | Helps distinguish from refs() which iterates over non-call references. llvm-svn: 264445
* CodeGen: Fix a use-after-free in TIIJustin Bogner2016-03-251-2/+4
| | | | | | Found by ASAN with the recycling allocator changes from PR26808. llvm-svn: 264443
* AMDGPU: Fix a use-after free and a missing breakJustin Bogner2016-03-251-1/+2
| | | | | | | | | | | | | | | | We're erasing MI here, but then immediately using it again inside the `if`. This moves the erase after we're done using it. Doing that reveals a second problem though - this case is missing a break, so we fall through to the default and dereference MI again. This is obviously a bug, though I don't know how to write a test that triggers it - all we do in the error case is print some extra debug output. Both of these issue crash on lots of tests under ASAN with the recycling allocator changes from PR26808 applied. llvm-svn: 264442
* [X86] Use "and $0" and "orl $-1" to store 0 and -1 when optimizing for minsizeHans Wennborg2016-03-251-0/+12
| | | | | | | | | | | | 64-bit, 32-bit and 16-bit move-immediate instructions are 7, 6, and 5 bytes, respectively, whereas and/or with 8-bit immediate is only three bytes. Since these instructions imply an additional memory read (which the CPU could elide, but we don't think it does), restrict these patterns to minsize functions. Differential Revision: http://reviews.llvm.org/D18374 llvm-svn: 264440
* Consider regmasks when computing register-based DBG_VALUE live rangesReid Kleckner2016-03-252-33/+63
| | | | | | | | | | | | | | Now register parameters that aren't saved to the stack or CSRs are considered dead after the first call. Previously the debugger would show whatever was in the register. Fixes PR26589 Reviewers: aprantl Differential Revision: http://reviews.llvm.org/D17211 llvm-svn: 264429
OpenPOWER on IntegriCloud