summaryrefslogtreecommitdiffstats
path: root/llvm/test
Commit message (Collapse)AuthorAgeFilesLines
* [XRay] Support for for tail calls for ARM no-ThumbDean Michael Berris2016-10-181-0/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds simplified support for tail calls on ARM with XRay instrumentation. Known issue: compiled with generic flags: `-O3 -g -fxray-instrument -Wall -std=c++14 -ffunction-sections -fdata-sections` (this list doesn't include my specific flags like --target=armv7-linux-gnueabihf etc.), the following program #include <cstdio> #include <cassert> #include <xray/xray_interface.h> [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fC() { std::printf("In fC()\n"); } [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fB() { std::printf("In fB()\n"); fC(); } [[clang::xray_always_instrument]] void __attribute__ ((noinline)) fA() { std::printf("In fA()\n"); fB(); } // Avoid infinite recursion in case the logging function is instrumented (so calls logging // function again). [[clang::xray_never_instrument]] void simplyPrint(int32_t functionId, XRayEntryType xret) { printf("XRay: functionId=%d type=%d.\n", int(functionId), int(xret)); } int main(int argc, char* argv[]) { __xray_set_handler(simplyPrint); printf("Patching...\n"); __xray_patch(); fA(); printf("Unpatching...\n"); __xray_unpatch(); fA(); return 0; } gives the following output: Patching... XRay: functionId=3 type=0. In fA() XRay: functionId=3 type=1. XRay: functionId=2 type=0. In fB() XRay: functionId=2 type=1. XRay: functionId=1 type=0. XRay: functionId=1 type=1. In fC() Unpatching... In fA() In fB() In fC() So for function fC() the exit sled seems to be called too much before function exit: before printing In fC(). Debugging shows that the above happens because printf from fC is also called as a tail call. So first the exit sled of fC is executed, and only then printf is jumped into. So it seems we can't do anything about this with the current approach (i.e. within the simplification described in https://reviews.llvm.org/D23988 ). Differential Revision: https://reviews.llvm.org/D25030 llvm-svn: 284456
* [AVX-512] Add test case to check shuffle decoding for masked vpermilps for ↵Craig Topper2016-10-181-0/+19
| | | | | | | | r284450. This is harder to do for vpermilpd as shuffle combining turns the constant vector into an immediate since all vpermilpd's inputs with constant vector can also be encoded with the immediate form. llvm-svn: 284455
* [X86] Fix DecodeVPERMVMask to handle cases where the constant pool entry has ↵Craig Topper2016-10-181-8/+2
| | | | | | | | a different type than the shuffle itself. This is especially important for 32-bit targets with 64-bit shuffle elements. llvm-svn: 284453
* [AVX-512] Fix DecodeVPERMV3Mask to handle cases where the constant pool ↵Craig Topper2016-10-181-16/+3
| | | | | | | | | | | | | | entry has a different type than the shuffle itself. Summary: This is especially important for 32-bit targets with 64-bit shuffle elements.This is similar to how PSHUFB and VPERMIL handle the same problem. Reviewers: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D25666 llvm-svn: 284451
* [AMDGPU] Mark .note section SHF_ALLOC so lld creates a segment for itKonstantin Zhuravlyov2016-10-171-0/+5
| | | | | | Differential Revision: https://reviews.llvm.org/D25694 llvm-svn: 284435
* Next set of additional error checks for invalid Mach-O files for theKevin Enderby2016-10-175-0/+12
| | | | | | | | | | | | load commands that use the MachO::sub_framework_command, MachO::sub_umbrella_command, MachO::sub_library_command and MachO::sub_client_command types but are not used in llvm libObject code but used in llvm tool code. This includes the LC_SUB_FRAMEWORK, LC_SUB_UMBRELLA, LC_SUB_LIBRARY and LC_SUB_CLIENT load commands. llvm-svn: 284431
* remove FIXME comment (fixed with r284424); NFCSanjay Patel2016-10-171-2/+0
| | | | llvm-svn: 284427
* [DAG] use isConstOrConstSplat in ComputeNumSignBits to optimize SRASanjay Patel2016-10-171-10/+0
| | | | | | | | | | | | | | The scalar version of this pattern was noted in: https://reviews.llvm.org/D25485 and fixed with: https://reviews.llvm.org/rL284395 More refactoring of the constant/splat helpers is needed and will happen in follow-up patches. Differential Revision: https://reviews.llvm.org/D25685 llvm-svn: 284424
* [opt] Strip coverage if debug info is not present.Davide Italiano2016-10-171-0/+20
| | | | | | | | | | | | | | | If -coverage is passed, but -g is not, clang populates the PassManager pipeline with StripSymbols(debugOnly = true). The stripSymbol pass therefore scans the list of named metadata, drops !llvm.dbg.cu, but leaves !llvm.gcov and !0 (the compileUnit MD) around. The verifier runs, and finds out that there's a CU not listed in !llvm.dbg.cu (as it was previously dropped) -> crash. When we strip debug info, so, check if there's coverage data, and strip it as well, in order to avoid pending metadata left around. Differential Revision: https://reviews.llvm.org/D25689 llvm-svn: 284418
* Ignore debug info when making optimization decisions in SimplifyCFG.Dehao Chen2016-10-171-4/+17
| | | | | | | | | | | | Summary: Debug info should *not* affect code generation. This patch properly handles debug info to make sure the generated code are the same with or without debug info. Reviewers: davidxl, mzolotukhin, jmolloy Subscribers: aprantl, llvm-commits Differential Revision: https://reviews.llvm.org/D25286 llvm-svn: 284415
* Handle relocations to thumb functions when dynamic linking COFF modulesWalter Erquinigo2016-10-171-4/+30
| | | | | | | | | | | | | | | | Summary: This adds the necessary logic to support relocations to thumb functions in the COFF dynamic linker. The jumps to function addresses are mostly blx, which requires the ISA selection bit when jumping to a thumb function. Note: I'm determining if the relocation requires the ISA bit when creating the relocation entries and not when resolving the relocation. I have to do that because I need the ObjectFile and the actual Symbol, which are available only when creating the entries. It would require a gross refactor if I do it otherwise, but I'm okay with doing it if you think it's better. Reviewers: peter.smith, compnerd Subscribers: rengolin, sas Differential Revision: https://reviews.llvm.org/D25151 llvm-svn: 284410
* GlobalISel: support wider range of load/store sizes in AArch64.Tim Northover2016-10-171-0/+160
| | | | llvm-svn: 284406
* AMDGPU/SI: Fix LowerParameter() for i16 argumentsTom Stellard2016-10-171-5/+1
| | | | | | | | | | | | | | Summary: If we are loading an i16 value from a 32-bit memory location, then we need to be able to truncate the loaded value to i16. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, tony-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D25198 llvm-svn: 284397
* [DAG] optimize away an arithmetic-right-shift of a 0 or -1 valueSanjay Patel2016-10-171-3/+0
| | | | | | | | | This came up as part of: https://reviews.llvm.org/D25485 Note that the vector case is missed because ComputeNumSignBits() is deficient for vectors. llvm-svn: 284395
* [x86] add tests to show missing DAG folds for arithmetic-shift-rightSanjay Patel2016-10-171-0/+44
| | | | llvm-svn: 284394
* [x86] auto-generate checksSanjay Patel2016-10-171-0/+17
| | | | llvm-svn: 284393
* [Object/ELF] - Check Header->e_shoff value earlier and do not crash.George Rimar2016-10-172-0/+4
| | | | | | | | | Patch checks that section pointer is aligned properly. This should be done before getStringTable() call. Differential revision: https://reviews.llvm.org/D25462 llvm-svn: 284387
* [SDAG] Use ABI type alignment for constant pools when optimizing for sizeJames Molloy2016-10-171-0/+19
| | | | | | | | SelectionDAG::getConstantPool will automatically determine an appropriate alignment if one is not specified. It does this by querying the type's preferred alignment. This can end up creating quite a lot of padding when the preferred alignment for vectors is 128. In optimize-for-size mode, it makes sense to instead query the ABI type alignment which is often smaller and causes less padding. llvm-svn: 284381
* [SimplifyCFG] Don't lower complex ConstantExprs to lookup tablesOliver Stannard2016-10-171-0/+40
| | | | | | | | | | Not all ConstantExprs can be represented by a global variable, for example most pointer arithmetic other than addition of a constant, so we can't convert these values from switch statements to lookup tables. Differential Revision: https://reviews.llvm.org/D25550 llvm-svn: 284379
* [SCEV] Consider delinearization pattern with extension with identity factorTobias Grosser2016-10-171-0/+64
| | | | | | | | | | | | Summary: The delinearization algorithm did not consider terms which had an extension without a multiply factor, i.e. a identify factor. We lose cases where size is char type where there will no multiply factor. Reviewers: sanjoy, grosser Subscribers: mzolotukhin, Eugene.Zelenko, llvm-commits, mssimpso, sanjoy, grosser Differential Revision: https://reviews.llvm.org/D16492 llvm-svn: 284378
* [CodeGenPrepare] When moving a zext near to its associated load, do not ↵Andrea Di Biagio2016-10-171-0/+71
| | | | | | | | | | | | | | | | | | | | | | | | | retain the original debug location. CodeGenPrepare knows how to move a zext of a load into the same basic block where the load lives. The goal is to help ISel match a zero-extending load instead of two separated instructions. CGP attempts to move a zext computation even if it lives in a basic block that does not post-dominate the load's basic block. That means, the hoisted zext may be speculated. Preserving the zext location would hurt the debugging experience and the quality of sample pgo. With this patch, when moving a zext near to its associated load, CGP no longer propagates the zext's debug location. Instead, CGP conservatively reuses the same debug location for the load and the zext. An alternative approach would be to assign an artificial line-0 location to the zext. However we don't want to over-use the 'line-0' for this particular case because it would have a size cost in the line-table section for no additional benefit. Differential Revision: https://reviews.llvm.org/D25611 llvm-svn: 284377
* Recommit r284371 "[Object/ELF] - Check that e_shnum is null when e_shoff is."George Rimar2016-10-174-0/+3
| | | | | | | | | | | | | | | | | | | | | With fix: hex edited the precompiled inputs from another testcases to pass new checks. Original commit message: [Object/ELF] - Check that e_shnum is null when e_shoff is. Spec says (http://www.sco.com/developers/gabi/1998-04-29/ch4.eheader.html) : e_shnum This member holds the number of entries in the section header table. Thus the product of e_shentsize and e_shnum gives the section header table's size in bytes. If a file has no section header table, e_shnum holds the value zero. Revealed using "id_000037,sig_11,src_000015,op_havoc,rep_8" from PR30540 That was the reason of crash in lld on incorrect input file. Binary reduced using afl-min. Differential revision: https://reviews.llvm.org/D25090 llvm-svn: 284374
* Revert r284371 "[Object/ELF] - Check that e_shnum is null when e_shoff is."George Rimar2016-10-172-3/+0
| | | | | | | It broke build bot: http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/908/steps/test-stage1-compiler/logs/stdio llvm-svn: 284373
* [Object/ELF] - Check that e_shnum is null when e_shoff is.George Rimar2016-10-172-0/+3
| | | | | | | | | | | | | | | Spec says (http://www.sco.com/developers/gabi/1998-04-29/ch4.eheader.html) : e_shnum This member holds the number of entries in the section header table. Thus the product of e_shentsize and e_shnum gives the section header table's size in bytes. If a file has no section header table, e_shnum holds the value zero. Revealed using "id_000037,sig_11,src_000015,op_havoc,rep_8" from PR30540 That was the reason of crash in lld on incorrect input file. Binary reduced using afl-min. Differential revision: https://reviews.llvm.org/D25090 llvm-svn: 284371
* [Object/ELF] - Do not crash on invalid section index.George Rimar2016-10-172-1/+1
| | | | | | | | | | | | | | | | If object has wrong (large) string table index and also incorrect large value for amount of sections in total, then section index passes the check: if (Index >= getNumSections()) return object_error::invalid_section_index; But result pointer then is far after end of file data, what result in a crash. Differential revision: https://reviews.llvm.org/D25081 llvm-svn: 284369
* [AVX-512] Add shuffle combining support for vpermi2var shuffles derived from ↵Craig Topper2016-10-171-32/+0
| | | | | | existing support for vpermt2var. llvm-svn: 284357
* [AVX-512] Add vpermi2var test cases to shuffle combining test case. ↵Craig Topper2016-10-171-0/+112
| | | | | | Combining will be added in a future commit. llvm-svn: 284356
* [AVX-512] Add support for turning a 256-bit load that goes to both halfs of ↵Craig Topper2016-10-162-60/+30
| | | | | | | | an insert_subvector into a subvector broadcast. Differential Revision: https://reviews.llvm.org/D25650 llvm-svn: 284353
* [AVX-512] Fix the operand order for vpermi2var_qi intrinsics to match the ↵Craig Topper2016-10-162-9/+9
| | | | | | other vpermi2var intrinsics. llvm-svn: 284329
* [AVX-512] Correct execution domain for VPERMT2PS and VPERMI2PS.Craig Topper2016-10-167-90/+90
| | | | llvm-svn: 284328
* [GVN/PRE] Hoist global values outside of loops.Davide Italiano2016-10-152-10/+57
| | | | | | | | | | | In theory this could be generalized to move anything where we prove the operands are available, but that would require rewriting PRE. As NewGVN will hopefully come soon, and we're trying to rewrite PRE in terms of NewGVN+MemorySSA, it's probably not worth spending too much time on it. Fix provided by Daniel Berlin! llvm-svn: 284311
* [X86][SSE] Added some basic examples of knownbits failing for vector typesSimon Pilgrim2016-10-151-0/+119
| | | | | | computeKnownBits only returns the common bits of each vector element instead of only the elements that are actually used llvm-svn: 284308
* [X86] Regenerate known bits testSimon Pilgrim2016-10-151-4/+18
| | | | llvm-svn: 284306
* [AVX-512] Add shuffle comments for vbroadcast instructions.Craig Topper2016-10-155-92/+98
| | | | llvm-svn: 284305
* AMDGPU/SI: Handle s_getreg hazard in GCNHazardRecognizerTom Stellard2016-10-151-5/+62
| | | | | | | | | | Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25526 llvm-svn: 284298
* GlobalISel: rename legalizer components to match others.Tim Northover2016-10-1423-23/+23
| | | | | | | | | | The previous names were both misleading (the MachineLegalizer actually contained the info tables) and inconsistent with the selector & translator (in having a "Machine") prefix. This should make everything sensible again. The only functional change is the name of a couple of command-line options. llvm-svn: 284287
* PowerPC: specify full triple to avoid different Darwin asm syntax.Tim Northover2016-10-141-1/+1
| | | | llvm-svn: 284281
* [ARM] add tests for PR30660Sanjay Patel2016-10-141-0/+26
| | | | llvm-svn: 284280
* [PowerPC] add tests for PR30661Sanjay Patel2016-10-141-0/+26
| | | | llvm-svn: 284279
* [PPC] Shorter sequence to load 64bit constant with same hi/lo wordsGuozhi Wei2016-10-141-0/+11
| | | | | | | | | | | | This is a patch to implement pr30640. When a 64bit constant has the same hi/lo words, we can use rldimi to copy the low word into high word of the same register. This optimization caused failure of test case bperm.ll because of not optimal heuristic in function SelectAndParts64. It chooses AND or ROTATE to extract bit groups from a register, and OR them together. This optimization lowers the cost of loading 64bit constant mask used in AND method, and causes different code sequence. But actually ROTATE method is better in this test case. The reason is in ROTATE method the final OR operation can be avoided since rldimi can insert the rotated bits into target register directly. So this patch also enhances SelectAndParts64 to prefer ROTATE method when the two methods have same cost and there are multiple bit groups need to be ORed together. Differential Revision: https://reviews.llvm.org/D25521 llvm-svn: 284276
* AMDGPU/SI: Use new SimplifyDemandedBits helper for multi-use operationsTom Stellard2016-10-143-8/+65
| | | | | | | | | | | | | Summary: We are using this helper for our 24-bit arithmetic combines, so we are now able to eliminate multi-use operations that mask the high-bits of 24-bit inputs (e.g. and x, 0xffffff) Reviewers: arsenm, nhaehnle Subscribers: tony-tye, arsenm, kzhuravl, wdng, nhaehnle, llvm-commits, yaxunl Differential Revision: https://reviews.llvm.org/D24672 llvm-svn: 284267
* Add a pass to optimize patterns of vectorized interleaved memory accesses forDavid L Kreitzer2016-10-143-0/+236
| | | | | | | | | | | | | X86. The pass optimizes as a unit the entire wide load + shuffles pattern produced by interleaved vectorization. This initial patch optimizes one pattern (64-bit elements interleaved by a factor of 4). Future patches will generalize to additional patterns. Patch by Farhana Aleen Differential revision: http://reviews.llvm.org/D24681 llvm-svn: 284260
* AMDGPU/SI: Don't allow unaligned scratch accessTom Stellard2016-10-143-17/+43
| | | | | | | | | | | | Summary: The hardware doesn't support this. Reviewers: arsenm Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, llvm-commits, tony-tye Differential Revision: https://reviews.llvm.org/D25523 llvm-svn: 284257
* [safestack] Use non-thread-local unsafe stack pointer for Contiki OSDavid L Kreitzer2016-10-141-2/+1
| | | | | | | | Patch by Michael LeMay Differential revision: http://reviews.llvm.org/D19852 llvm-svn: 284254
* [X86] Take advantage of the lzcnt instruction on btver2 architectures when ↵Pierre Gousseau2016-10-141-0/+341
| | | | | | | | | | | | | | | | ORing comparisons to zero. This change adds transformations such as: zext(or(setcc(eq, (cmp x, 0)), setcc(eq, (cmp y, 0)))) To: srl(or(ctlz(x), ctlz(y)), log2(bitsize(x)) This optimisation is beneficial on Jaguar architecture only, where lzcnt has a good reciprocal throughput. Other architectures such as Intel's Haswell/Broadwell or AMD's Bulldozer/PileDriver do not benefit from it. For this reason the change also adds a "HasFastLZCNT" feature which gets enabled for Jaguar. Differential Revision: https://reviews.llvm.org/D23446 llvm-svn: 284248
* [InstCombine] use m_APInt to allow sub with constant folds for splat vectorsSanjay Patel2016-10-142-5/+3
| | | | llvm-svn: 284247
* [InstCombine] add tests for missing vector foldsSanjay Patel2016-10-142-0/+33
| | | | llvm-svn: 284245
* [InstCombine] auto-generate checksSanjay Patel2016-10-141-22/+24
| | | | llvm-svn: 284244
* [InstCombine] remove redundant testSanjay Patel2016-10-141-11/+0
| | | | | | | | This test was apparently checking for 2 independent folds, but we have plenty of tests for those individual folds already. We are lacking vector tests, however, because we don't have the shift folds for vectors. llvm-svn: 284243
* [InstCombine] update test to use FileCheck and auto-generate checksSanjay Patel2016-10-141-71/+148
| | | | llvm-svn: 284242
OpenPOWER on IntegriCloud