summaryrefslogtreecommitdiffstats
path: root/llvm
Commit message (Collapse)AuthorAgeFilesLines
* Move the complex address expression out of DIVariable and into an extraAdrian Prantl2014-10-01257-1378/+1535
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! Note: I accidentally committed a bogus older version of this patch previously. llvm-svn: 218787
* LTO: Add missing target triple from r218784Duncan P. N. Exon Smith2014-10-011-0/+2
| | | | llvm-svn: 218786
* Add fptrunc to mips fast-selReed Kotler2014-10-012-0/+45
| | | | | | | | | | | | | | | | | | Summary: Implement conversion of 64 to 32 bit floating point numbers (fptrunc) in mips fast-isel Test Plan: fptrunc.ll checked also with 4 internal mips build bot flavors mip32r1/miprs32r2 and at -O0 and -O2 Reviewers: dsanders Reviewed By: dsanders Subscribers: rfuhler Differential Revision: http://reviews.llvm.org/D5553 llvm-svn: 218785
* LTO: Ignore disabled diagnostic remarksDuncan P. N. Exon Smith2014-10-017-14/+91
| | | | | | | | | | | | | | | | | | | | | | | r206400 and r209442 added remarks that are disabled by default. However, if a diagnostic handler is registered, the remarks are sent unfiltered to the handler. This is the right behaviour for clang, since it has its own filters. However, the diagnostic handler exposed in the LTO API receives only the severity and message. It doesn't have the information to filter by pass name. For LTO, disabled remarks should be filtered by the producer. I've changed `LLVMContext::setDiagnosticHandler()` to take a `bool` argument indicating whether to respect the built-in filters. This defaults to `false`, so other consumers don't have a behaviour change, but `LTOCodeGenerator::setDiagnosticHandler()` sets it to `true`. To make this behaviour testable, I added a `-use-diagnostic-handler` command-line option to `llvm-lto`. This fixes PR21108. llvm-svn: 218784
* Add an immovable type to test Optional<T>::emplace more rigorously after ↵David Blaikie2014-10-011-5/+26
| | | | | | r218732. llvm-svn: 218783
* Revert r218778 while investigating buldbot breakage.Adrian Prantl2014-10-01257-1530/+1378
| | | | | | "Move the complex address expression out of DIVariable and into an extra" llvm-svn: 218782
* Move the complex address expression out of DIVariable and into an extraAdrian Prantl2014-10-01257-1378/+1530
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! llvm-svn: 218778
* R600: Call EmitFunctionHeader() in the AsmPrinter to populate the ELF symbol ↵Tom Stellard2014-10-01266-1520/+1521
| | | | | | table llvm-svn: 218776
* C API: Add LLVMCloneModule()Tom Stellard2014-10-012-0/+13
| | | | llvm-svn: 218775
* Revert r216862 due to a performance regressionJingyue Wu2014-10-014-59/+28
| | | | | | Reported by Alexey Volkov in PR21115 llvm-svn: 218771
* [mips] Rename emit and parse functions for the .cpload assembler directive. NFC.Toma Tabacu2014-10-013-10/+10
| | | | | | | | | | | | | | Summary: It's better if we have a consistent name for .cpload-related functions. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5437 llvm-svn: 218768
* R600/SI: Add a generic pseudo EXP instructionTom Stellard2014-10-013-8/+30
| | | | llvm-svn: 218767
* R600/SI: Add generic pseudo MTBUF instructionsTom Stellard2014-10-013-31/+58
| | | | llvm-svn: 218766
* R600/SI: Add generic pseudo SMRD instructionsTom Stellard2014-10-012-14/+39
| | | | llvm-svn: 218765
* [ARM] Allow selecting VRINT[APMXZR] and VCVT[BT] instructions for FPv5Oliver Stannard2014-10-014-52/+82
| | | | | | | | | | Currently, we only codegen the VRINT[APMXZR] and VCVT[BT] instructions when targeting ARMv8, but they are actually present on any target with FP-ARMv8. Note that FP-ARMv8 is called FPv5 when is is part of an M-profile core, but they have the same instructions so we model them both as FPARMv8 in the ARM backend. llvm-svn: 218763
* [x86] Fix a few more tiny patterns with the new vector shuffle loweringChandler Carruth2014-10-012-5/+213
| | | | | | | | | | | | | | | | | | | that keep cropping up in the regression test suite. This also addresses one of the issues raised on the mailing list with failing to form 'movsd' in as many cases as we realistically should. There will be corresponding patches forthcoming for v4f32 at least. This was a lot of fuss for a relatively small gain, but all the fuss was on my end trying different ways of holding the pieces of the x86 fragment patterns *just right*. Now that it works, the code is reasonably simple. In the new test cases I'm adding here, v2i64 sticks out as just plain horrible. I've not come up with any great ideas here other than that it would be nice to recognize when we're *going* to take a domain crossing hit and cross earlier to get the decent instructions. At least with AVX it is slightly less silly.... llvm-svn: 218756
* [x86] Delete some extraneous logic from the new vector shuffle lowering.Chandler Carruth2014-10-011-7/+0
| | | | | | | | Nothing was relying on this and there are potentially some edge cases that it would not be correct under. Removing it seems better than trying to "fix" it as nothing was relying on it. llvm-svn: 218755
* [AArch64] Allow access to all system registers with MRS/MSR instructions.Tom Coxon2014-10-018-70/+44
| | | | | | | | | | | | | | | | | | | | | The A64 instruction set includes a generic register syntax for accessing implementation-defined system registers. The syntax for these registers is: S<op0>_<op1>_<CRn>_<CRm>_<op2> The encoding space permitted for implementation-defined system registers is: op0 op1 CRn CRm op2 11 xxx 1x11 xxxx xxx The full encoding space can now be accessed: op0 op1 CRn CRm op2 xx xxx xxxx xxxx xxx This is useful to anyone needing to write assembly code supporting new system registers before the assembler has learned the official names for them. llvm-svn: 218753
* Revert r218721, r218735.Evgeniy Stepanov2014-10-015-283/+9
| | | | | | | | | | Failing bootstrap on Linux (arm, x86). http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux/builds/13139/steps/bootstrap%20clang/logs/stdio http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-selfhost/builds/470 http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/8518 llvm-svn: 218752
* Add missing natual vector cast.Asiri Rathnayake2014-10-013-0/+67
| | | | | | | | | Summary: The natual vector cast node (similar to bitcast) AArch64ISD::NVCAST was introduced in r217159 and r217138. This patch adds a missing cast from v2f32 to v1i64 which is causing some compilation failures. Also added test cases to cover various modimm types and BUILD_VECTORs with i64 elements. llvm-svn: 218751
* ADTTests/OptionalTest.cpp: Use LLVM_DELETED_FUNCTION.NAKAMURA Takumi2014-10-011-4/+4
| | | | llvm-svn: 218750
* [ARM] Add support for Cortex-M7, FPv5-SP and FPv5-DP (LLVM)Oliver Stannard2014-10-0111-18/+75
| | | | | | | | | The Cortex-M7 has 3 options for its FPU: none, FPv5-SP-D16 and FPv5-DP-D16. FPv5 has the same instructions as FP-ARMv8, so it can be modelled using the same target feature, and all double-precision operations are already disabled by the fp-only-sp target features. llvm-svn: 218747
* [mips] Fix disassembly of [ls][wd]c[23], cache, and pref ↵Daniel Sanders2014-10-016-12/+144
| | | | | | | | Fixes PR21015, and PR20993. Patch by Jun Koi llvm-svn: 218745
* [mips] For indirect calls we don't need $gp to point to .got. Mips linkerSasa Stankovic2014-10-018-19/+47
| | | | | | | | | doesn't generate lazy binding stub for a function whose address is taken in the program. Differential Revision: http://reviews.llvm.org/D5067 llvm-svn: 218744
* test: XFAIL the non-darwin gmlt test on darwinJustin Bogner2014-10-011-0/+3
| | | | | | | r218702 disabled a -gmlt optimization for darwin, but this means the non-darwin test isn't working there anymore. llvm-svn: 218742
* [MCJIT] Turn the getSymbolAddress free function created in r218626 into a staticLang Hames2014-10-013-7/+13
| | | | | | | | | | | member of RTDyldMemoryManager (and rename to getSymbolAddressInProcess). The functionality this provides is very specific to RTDyldMemoryManager, so it makes sense to keep it in that class to avoid accidental re-use. No functional change. llvm-svn: 218741
* Fix typo in comment from r218733Nick Lewycky2014-10-011-1/+1
| | | | llvm-svn: 218739
* InstrProf: Make coverage::Counter comparableJustin Bogner2014-10-011-0/+4
| | | | | | I'll be using this in a clang change very soon. llvm-svn: 218736
* [InstCombine] Fix for assert build failures caused by r218721Gerolf Hoflehner2014-10-011-1/+7
| | | | | | | | | The icmp-select-icmp optimization made the implicit assumption that the select-icmp instructions are in the same block and asserted on it. The fix explicitly checks for that condition and conservatively suppresses the optimization when it is violated. llvm-svn: 218735
* [x86] Teach the new vector shuffle lowering to be even more aggressiveChandler Carruth2014-10-012-17/+12
| | | | | | | | | | | | | | | | | | | | in exposing the scalar value to the broadcast DAG fragment so that we can catch even reloads and fold them into the broadcast. This is somewhat magical I'm afraid but seems to work. It is also what the old lowering did, and I've switched an old test to run both lowerings demonstrating that we get the same result. Unlike the old code, I'm not lowering f32 or f64 scalars through this path when we only have AVX1. The target patterns include pretty heinous code to re-cast those as shuffles when the scalar happens to not be spilled because AVX1 provides no broadcast mechanism from registers what-so-ever. This is terribly brittle. I'd much rather go through our generic lowering code to get this. If needed, we can add a peephole to get even more opportunities to broadcast-from-spill-slots that are exposed post-RA, but my suspicion is this just doesn't matter that much. llvm-svn: 218734
* [x86] Hoist the zext-lowering up in the v4i32 lowering routine -- it isChandler Carruth2014-10-012-11/+27
| | | | | | | | | | the same speed as pshufd but we can fold loads into the pmovzx instructions. This fixes some regressions that came up in the regression test suite for the new vector shuffle lowering. llvm-svn: 218733
* Add an emplace(...) method to llvm::Optional<T>.Jordan Rose2014-10-012-0/+105
| | | | | | | | | | | | | This can be used for in-place initialization of non-moveable types. For compilers that don't support variadic templates, only up to four arguments are supported. We can always add more, of course, but this should be good enough until we move to a later MSVC that has full support for variadic templates. Inspired by std::experimental::optional from the "Library Fundamentals" C++ TS. Reviewed by David Blaikie. llvm-svn: 218732
* Implement DW_TAG_subrange_type with DW_AT_count rather than DW_AT_upper_boundDavid Blaikie2014-10-014-14/+8
| | | | | | | | | | | | | | | This allows proper disambiguation of unbounded arrays and arrays of zero bound ("struct foo { int x[]; };" and "struct foo { int x[0]; }"). GCC instead produces an upper bound of -1 in the latter situation, but count seems tidier. This way lower_bound is provided if it's not the language default and count is provided if the count is known, otherwise it's omitted. Simple. If someone wants to look at rdar://problem/12566646 and see if this change is acceptable to that bug/fix, that might be helpful (see the empty-and-one-elem-array.ll test case which cites that radar). llvm-svn: 218726
* [AVX512] Remove space before \t in AsmStrings.Adam Nemet2014-10-011-6/+6
| | | | llvm-svn: 218725
* [x86] Teach the new vector shuffle lowering about VBROADCAST andChandler Carruth2014-10-0110-263/+437
| | | | | | | | | | VPBROADCAST. This has the somewhat expected pervasive impact. I don't know why I forgot about this. Everything seems good with lots of significant improvements in the tests. llvm-svn: 218724
* llvm-cov/CoverageReport.cpp: Quick fix for msvcrt, since width specifier "z" ↵NAKAMURA Takumi2014-10-011-12/+12
| | | | | | | | is unavailable. Note, mingw uses its own printf instead of msvcrt. llvm-svn: 218723
* llvm/test/DebugInfo/X86/gmlt.test: Get rid of %llc_dwarf. It should not be ↵NAKAMURA Takumi2014-10-011-2/+1
| | | | | | | | used with -mtriple. Also, remove object-emission. test/DebugInfo/X86 doesn't require it. llvm-svn: 218722
* [InstCombine] Optimize icmp-select-icmpGerolf Hoflehner2014-10-015-9/+277
| | | | | | | | | | | | | | | | | | | | | | | | | | | In special cases select instructions can be eliminated by replacing them with a cheaper bitwise operation even when the select result is used outside its home block. The instances implemented are patterns like %x=icmp.eq %y=select %x,%r, null %z=icmp.eq|neq %y, null br %z,true, false ==> %x=icmp.ne %y=icmp.eq %r,null %z=or %x,%y br %z,true,false The optimization is integrated into the instruction combiner and performed only when all uses of the select result can be replaced by the select operand proper. For this dominator information is used and dominance is now a required analysis pass in the combiner. The optimization itself is iterative. The critical step is to replace the select result with the non-constant select operand. So the select becomes local and the combiner iteratively works out simpler code pattern and eventually eliminates the select. rdar://17853760 llvm-svn: 218721
* Omit DW_AT_inline under -gmlt to save a little more space.David Blaikie2014-09-302-2/+2
| | | | llvm-svn: 218719
* [BasicAA] Make better use of zext and sign informationHal Finkel2014-09-303-2/+95
| | | | | | | | | | | | | | | | | Two related things: 1. Fixes a bug when calculating the offset in GetLinearExpression. The code previously used zext to extend the offset, so negative offsets were converted to large positive ones. 2. Enhance aliasGEP to deduce that, if the difference between two GEP allocations is positive and all the variables that govern the offset are also positive (i.e. the offset is strictly after the higher base pointer), then locations that fit in the gap between the two base pointers are NoAlias. Patch by Nick White! llvm-svn: 218714
* DebugInfo: Sink the code emitting DW_AT_APPLE_omit_frame_ptr down to a more ↵David Blaikie2014-09-302-7/+5
| | | | | | | | | | | common spot. No functional change. Pre-emptive refactoring before I start pushing some of this subprogram creation down into DWARFCompileUnit so I can build different subprograms in the skeleton unit from the dwo unit for adding -gmlt-like data to the skeleton. llvm-svn: 218713
* MSBuild integration: fix the loop in install.batHans Wennborg2014-09-302-8/+18
| | | | | | | It would previously not continue the platforms loop unless it could find the latest toolset directory. llvm-svn: 218712
* [SimplifyCFG] threshold for folding branches with common destinationJingyue Wu2014-09-305-75/+127
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch adds a threshold that controls the number of bonus instructions allowed for folding branches with common destination. The original code allows at most one bonus instruction. With this patch, users can customize the threshold to allow multiple bonus instructions. The default threshold is still 1, so that the code behaves the same as before when users do not specify this threshold. The motivation of this change is that tuning this threshold significantly (up to 25%) improves the performance of some CUDA programs in our internal code base. In general, branch instructions are very expensive for GPU programs. Therefore, it is sometimes worth trading more arithmetic computation for a more straightened control flow. Here's a reduced example: __global__ void foo(int a, int b, int c, int d, int e, int n, const int *input, int *output) { int sum = 0; for (int i = 0; i < n; ++i) sum += (((i ^ a) > b) && (((i | c ) ^ d) > e)) ? 0 : input[i]; *output = sum; } The select statement in the loop body translates to two branch instructions "if ((i ^ a) > b)" and "if (((i | c) ^ d) > e)" which share a common destination. With the default threshold, SimplifyCFG is unable to fold them, because computing the condition of the second branch "(i | c) ^ d > e" requires two bonus instructions. With the threshold increased, SimplifyCFG can fold the two branches so that the loop body contains only one branch, making the code conceptually look like: sum += (((i ^ a) > b) & (((i | c ) ^ d) > e)) ? 0 : input[i]; Increasing the threshold significantly improves the performance of this particular example. In the configuration where both conditions are guaranteed to be true, increasing the threshold from 1 to 2 improves the performance by 18.24%. Even in the configuration where the first condition is false and the second condition is true, which favors shortcuts, increasing the threshold from 1 to 2 still improves the performance by 4.35%. We are still looking for a good threshold and maybe a better cost model than just counting the number of bonus instructions. However, according to the above numbers, we think it is at least worth adding a threshold to enable more experiments and tuning. Let me know what you think. Thanks! Test Plan: Added one test case to check the threshold is in effect Reviewers: nadav, eliben, meheff, resistor, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, llvm-commits Differential Revision: http://reviews.llvm.org/D5529 llvm-svn: 218711
* [x86] Add AVX1 and AVX2 testing to all of the 128-bit shuffle testChandler Carruth2014-09-304-375/+855
| | | | | | | | | | | | | | cases. While clearly we don't need the AVX vector width, these ISA extensions often cause us to select different instructions and we should cover them even with the narrow vector width. Also, while here, nuke the stress_test2 contents. There is no reason to try to FileCheck this entire body when it is mostly a test for successfully surviving the code generator. llvm-svn: 218710
* [x86] Update the exact FileCheck syntax of the 256-bit and 512-bitChandler Carruth2014-09-305-1961/+1962
| | | | | | | | | | | shuffle tests to match that used in the script I posted and now used consistently in 128-bit tests. Nothing interesting changing here, just using the label name as the FileCheck label and a slightly more general comment marker consumption strategy. llvm-svn: 218709
* Adjust test case addition in r218702 so as not to fail when the X86 target ↵David Blaikie2014-09-303-2/+5
| | | | | | isn't built. llvm-svn: 218708
* [x86] Rework all of the 128-bit vector shuffle tests with my handy testChandler Carruth2014-09-304-1222/+2541
| | | | | | | | | | | | | | | | | | | | | | | | | updating script so that they are more thorough and consistent. Specific fixes here include: - Actually test VEX-encoded AVX mnemonics. - Actually use an SSE 4.1 run to test SSE 4.1 features! - Correctly check instructions sequences from the start of the function. - Elide the shuffle operands and comment designator in a consistent way. - Test all of the architectures instead of just the ones I was motivated to manually author. I've gone back through and fixed up any egregious issues I spotted. Let me know if I missed something you really dislike. One downside to this is that we're now not as diligently using FileCheck variables for registers. I would be much more concerned with this if we had larger register usage, but there just aren't that interesting of register choices here and most of the registers are constrained by the ABI. Ultimately, I don't think this is likely to be the maintenance burden for these tests and updating them again should be staright forward. llvm-svn: 218707
* Disable the -gmlt optimization implemented in r218129 under Darwin due to ↵David Blaikie2014-09-303-3/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | issues with dsymutil. r218129 omits DW_TAG_subprograms which have no inlined subroutines when emitting -gmlt data. This makes -gmlt very low cost for -O0 builds. Darwin's dsymutil reasonably considers a CU empty if it has no subprograms (which occurs with the above optimization in -O0 programs without any force_inline function calls) and drops the line table, CU, and everything in this situation, making backtraces impossible. Until dsymutil is modified to account for this, disable this optimization on Darwin to preserve the desired functionality. (see r218545, which should be reverted after this patch, for other discussion/details) Footnote: In the long term, it doesn't look like this scheme (of simplified debug info to describe inlining to enable backtracing) is tenable, it is far too size inefficient for optimized code (the DW_TAG_inlined_subprograms, even once compressed, are nearly twice as large as the line table itself (also compressed)) and we'll be considering things like Cary's two level line table proposal to encode all this information directly in the line table. llvm-svn: 218702
* Use the target-specified iteration count to opt out of any further ↵Sanjay Patel2014-09-301-60/+62
| | | | | | refinement of an estimate. NFC. llvm-svn: 218700
* Split the estimate() interface into separate functions for each type. NFC.Sanjay Patel2014-09-304-34/+61
| | | | | | | | | | | | It was hacky to use an opcode as a switch because it won't always match (rsqrte != sqrte), and it looks like we'll need to add more special casing per arch than I had hoped for. Eg, x86 will prefer a different NR estimate implementation. ARM will want to use it's 'step' instructions. There also don't appear to be any new estimate instructions in any arch in a long, long time. Altivec vloge and vexpte may have been the first and last in that field... llvm-svn: 218698
OpenPOWER on IntegriCloud