bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[Hexagon] Adding XTYPE/MPY intrinsic tests and some missing multiply ↵	Colin LeMahieu	2015-01-28	4	-4/+52
\| \| \| \| \| \|	instructions. llvm-svn: 227347
*	Refactoring llvm command line parsing and option registration.	Chris Bieneman	2015-01-28	1	-170/+158
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The primary goal of this patch is to remove the need for MarkOptionsChanged(). That goal is accomplished by having addOption and removeOption properly sort the options. This patch puts the new add and remove functionality on a CommandLineParser class that is a placeholder. Some of the functionality in this class will need to be merged into the OptionRegistry, and other bits can hopefully be in a better abstraction. This patch also removes the RegisteredOptionList global, and the need for cl::Option objects to be linked list nodes. The changes in CommandLineTest.cpp are required because these changes shift when we validate that options are not duplicated. Before this change duplicate options were only found during certain cl API calls (like cl::ParseCommandLine). With this change duplicate options are found during option construction. Reviewers: dexonsmith, chandlerc, pete Reviewed By: pete Subscribers: pete, majnemer, llvm-commits Differential Revision: http://reviews.llvm.org/D7132 llvm-svn: 227345
*	[Hexagon] Deleting a lot of old variants of intrinsics and updating references.	Colin LeMahieu	2015-01-28	5	-237/+78
\| \| \| \|	llvm-svn: 227338
*	[Hexagon] Converting XTYPE/BIT intrinsic patterns and adding tests.	Colin LeMahieu	2015-01-28	2	-34/+143
\| \| \| \|	llvm-svn: 227335
*	use SDValue methods directly instead of getNode()->* ; NFCI	Sanjay Patel	2015-01-28	1	-10/+10
\| \| \| \|	llvm-svn: 227334
*	Simplify code. NFC.	Rafael Espindola	2015-01-28	1	-3/+1
\| \| \| \|	llvm-svn: 227333
*	[Hexagon] Replacing XTYPE/SHIFT intrinsic patternss. Adding tests and ↵	Colin LeMahieu	2015-01-28	5	-194/+225
\| \| \| \| \| \|	missing instructions with tests. llvm-svn: 227330
*	[mips][microMIPS] Implement LWGP instruction	Jozef Kolek	2015-01-28	6	-1/+100
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D6650 llvm-svn: 227325
*	[Hexagon] Replacing intrinsics for halfword adds and max/min word/dword.	Colin LeMahieu	2015-01-28	2	-82/+55
\| \| \| \|	llvm-svn: 227322
*	Fix LLVMSetMetadata and LLVMAddNamedMetadataOperand for single value MDNodes	Bjorn Steinbrink	2015-01-28	1	-4/+18
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: MetadataAsValue uses a canonical format that strips the MDNode if it contains only a single constant value. This triggers an assertion when trying to cast the value to a MDNode. Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7165 llvm-svn: 227319
*	[x32] Change the condition from bitness to LP64 for TCRETURNdi64.	Michael Kuperstein	2015-01-28	1	-2/+2
\| \| \| \| \| \|	TCRETURNmi64, which was mistakenly changed in r227307 will wait for another day. llvm-svn: 227317
*	R600: Move DataLayout to AMDGPUTargetMachine	Tom Stellard	2015-01-28	4	-22/+23
\| \| \| \| \| \| \| \|	This is a follow up to r227113. It is now required to use the amdgcn target for SI and newer GPUs. llvm-svn: 227316
*	R600: Use a Southern Islands GPU as the default for the amdgcn target	Tom Stellard	2015-01-28	2	-3/+7
\| \| \| \|	llvm-svn: 227314
*	Correct the AggressiveAntiDepBreaker's handling of subregisters defining ↵	Hal Finkel	2015-01-28	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	super registers As the AggressiveAntiDepBreaker iterated backward through a scheduling region, we must leave super registers live through subregister definitions so that all relevant subregister definitions are renamed together. The problem was that we were also discarding sub-register use locations as the sub-registers are redefined. The result is that we'd rename the super register along with some, but not all, subregister definitions. R0_D = {R0_L, R1_L} R0_L = {R0_S, R1_S} %R0_L<def> = TRLi9 16, pred:8, pred:%noreg %R1_L<def> = LSRLrr %R1_L<kill>, %R0_S, pred:8, pred:%noreg %R0_L<def> = LSRLrr %R2_L, %R0_S, pred:8, pred:%noreg, %R0_L<imp-use,kill> %R1_L<def> = ANDLri %R1_L<kill>, 2047, pred:8, pred:%noreg %R0_L<def> = ANDLri %R0_L<kill>, 2047, pred:8, pred:%noreg %R4_D<def> = ASRDrr %R0_D<kill>, %R6_S Anti: %R4_D<def> = ASRDrr %R0_D<kill>, %R6_S Def Groups: R4_D=g213->g215(via R4_S)->g214(via R4_L)->g216(via R5_S)->g216(via R4_L)->g217(via R5_L) Use Groups: R0_D=g0->g218(last-use) R1_L->g219(last-use) R6_S=g204->g220(last-use) Anti: %R0_L<def> = ANDLri %R0_L<kill>, 2047, pred:8, pred:%noreg Def Groups: R0_L=g208->g209(via R0_S)->g218(via R0_D)->g210(via R1_S)->g210(via R0_D) Antidep reg: R0_L (real dependency) Use Groups: R0_L=g210->g224(last-use) R0_S->g225(last-use) R1_S->g226(last-use) Anti: %R1_L<def> = ANDLri %R1_L<kill>, 2047, pred:8, pred:%noreg Def Groups: R1_L=g219->g210(via R0_D) Antidep reg: R1_L (real dependency) Use Groups: R1_L=g210->g229(last-use) Anti: %R0_L<def> = LSRLrr %R2_L, %R0_S, pred:8, pred:%noreg, %R0_L<imp-use,kill> Def Groups: R0_L=g224->g225(via R0_S)->g210(via R0_D)->g226(via R1_S)->g226(via R0_D) Antidep reg: R0_L Use Groups: R2_L=g192 R0_S=g226->g230(last-use) R0_L=g226->g231(last-use) R1_S->g232(last-use) Anti: %R1_L<def> = LSRLrr %R1_L<kill>, %R0_S, pred:8, pred:%noreg Def Groups: R1_L=g229->g226(via R0_D) Antidep reg: R1_L Use Groups: R1_L=g226->g233(last-use) R0_S=g230 Anti: %R0_L<def> = TRLi9 16, pred:8, pred:%noreg Def Groups: R0_L=g231->g230(via R0_S)->g226(via R0_D)->g232(via R1_S)->g232(via R0_D) Antidep reg: R0_L Rename Candidates for Group g232: R0_D: elcInt64Regs :: R0_D R1_D R2_D R3_D R4_D R5_D R8_D R9_D R10_D R11_D R12_D R13_D R14_D R15_D R16_D R17_D R18_D R19_D R20_D R21_D R22_D R23_D R24_D R25_D R0_L: elcIntRegs :: R0_L R1_L R2_L R3_L R4_L R5_L R8_L R9_L R10_L R11_L R12_L R13_L R14_L R15_L R16_L R17_L R18_L R19_L R20_L R21_L R22_L R23_L R24_L R25_L R0_S: elcShrtRegs elcShrtRegs :: R0_S R1_S R2_S R3_S R4_S R5_S R8_S R9_S R10_S R11_S R12_S R13_S R14_S R15_S R16_S R17_S R18_S R19_S R20_S R21_S R22_S R23_S R24_S R25_S Find Registers: [R12_D: R12_D R12_L R12_S] Breaking anti-dependence edge on R0_L: R0_D->R12_D(1 refs) R0_L->R12_L(2 refs) R0_S->R12_S(2 refs) Use Groups: ... %R12_L<def> = TRLi9 16, pred:8, pred:%noreg %R1_L<def> = LSRLrr %R1_L<kill>, %R12_S, pred:8, pred:%noreg %R0_L<def> = LSRLrr %R2_L<kill>, %R12_S, pred:8, pred:%noreg, %R12_L<imp-use> %R1_L<def> = ANDLri %R1_L<kill>, 2047, pred:8, pred:%noreg %R0_L<def> = ANDLri %R0_L<kill>, 2047, pred:8, pred:%noreg %R4_D<def> = ASRDrr %R12_D<kill>, %R6_S With this change, we now produce: Anti: %R4_D<def> = ASRDrr %R0_D<kill>, %R6_S Def Groups: R4_D=g213->g215(via R4_S)->g214(via R4_L)->g216(via R5_S)->g216(via R4_L)->g217(via R5_L) Use Groups: R0_D=g0->g218(last-use) R1_L->g219(last-use) R6_S=g204->g220(last-use) Anti: %R0_L<def> = ANDLri %R0_L<kill>, 2047, pred:8, pred:%noreg Def Groups: R0_L=g208->g209(via R0_S)->g218(via R0_D)->g210(via R1_S)->g210(via R0_D) Antidep reg: R0_L (real dependency) Use Groups: R0_L=g210 Anti: %R1_L<def> = ANDLri %R1_L<kill>, 2047, pred:8, pred:%noreg Def Groups: R1_L=g219->g210(via R0_D) Antidep reg: R1_L (real dependency) Use Groups: R1_L=g210 Anti: %R0_L<def> = LSRLrr %R2_L, %R0_S, pred:8, pred:%noreg, %R0_L<imp-use,kill> Def Groups: R0_L=g210->g210(via R0_D)->g210(via R0_D) Antidep reg: R0_L Use Groups: R2_L=g192 R0_S=g210 R0_L=g210 Anti: %R1_L<def> = LSRLrr %R1_L<kill>, %R0_S, pred:8, pred:%noreg Def Groups: R1_L=g210->g210(via R0_D) Antidep reg: R1_L Use Groups: R1_L=g210 R0_S=g210 Anti: %R0_L<def> = TRLi9 16, pred:8, pred:%noreg Def Groups: R0_L=g210->g210(via R0_D)->g210(via R0_D) Antidep reg: R0_L Rename Candidates for Group g210: R0_D: elcInt64Regs :: R0_D R1_D R2_D R3_D R4_D R5_D R8_D R9_D R10_D R11_D R12_D R13_D R14_D R15_D R16_D R17_D R18_D R19_D R20_D R21_D R22_D R23_D R24_D R25_D R0_L: elcIntRegs elcIntAIRegs elcIntRegs elcIntRegs elcIntRegs elcIntRegs :: R0_L R1_L R2_L R3_L R4_L R5_L R8_L R9_L R10_L R11_L R12_L R13_L R14_L R15_L R16_L R17_L R18_L R19_L R20_L R21_L R22_L R23_L R24_L R25_L R1_L: elcIntRegs elcIntRegs elcIntRegs elcIntRegs elcIntRegs :: R0_L R1_L R2_L R3_L R4_L R5_L R8_L R9_L R10_L R11_L R12_L R13_L R14_L R15_L R16_L R17_L R18_L R19_L R20_L R21_L R22_L R23_L R24_L R25_L R0_S: elcShrtRegs elcShrtRegs :: R0_S R1_S R2_S R3_S R4_S R5_S R8_S R9_S R10_S R11_S R12_S R13_S R14_S R15_S R16_S R17_S R18_S R19_S R20_S R21_S R22_S R23_S R24_S R25_S Find Registers: [R12_D: R12_D R12_L R13_L R12_S] Breaking anti-dependence edge on R0_L: R0_D->R12_D(1 refs) R0_L->R12_L(7 refs) R1_L->R13_L(5 refs) R0_S->R12_S(2 refs) Use Groups: ... %R12_L<def> = TRLi9 16, pred:8, pred:%noreg %R13_L<def> = LSRLrr %R13_L<kill>, %R12_S, pred:8, pred:%noreg %R12_L<def> = LSRLrr %R2_L<kill>, %R12_S<kill>, pred:8, pred:%noreg, %R12_L<imp-use,kill> %R13_L<def> = ANDLri %R13_L<kill>, 2047, pred:8, pred:%noreg %R12_L<def> = ANDLri %R12_L<kill>, 2047, pred:8, pred:%noreg %R4_D<def> = ASRDrr %R12_D, %R6_S, %R12_L<imp-def>, %R12_S<imp-def>, %R13_S<imp-def> As demonstrated by this example, this is also somewhat unfortunate, because there is actually no need to rename the super register in this case (it is fully covered by later subregister definitions), but we don't seem to track enough information here to exploit that either. Thanks to Daniil Troshkov for reporting the issue. The debug outputs in this commit message are from Daniil. llvm-svn: 227311
*	[X86] Reduce some 32-bit imuls into lea + shl	Michael Kuperstein	2015-01-28	1	-3/+2
\| \| \| \| \| \| \| \|	Reduce integer multiplication by a constant of the form k*2^c, where k is in {3,5,9} into a lea + shl. Previously it was only done for imulq on 64-bit platforms, but it makes sense for imull and 32-bit as well. Differential Revision: http://reviews.llvm.org/D7196 llvm-svn: 227308
*	[x32] Enable sibcall optimization on x32.	Michael Kuperstein	2015-01-28	2	-6/+7
\| \| \| \| \| \| \| \|	This includes two things: 1) Fix TCRETURNdi and TCRETURN64di patterns to check the right thing (LP64 as opposed to target bitness). 2) Allow LEA64_32 in MatchingStackOffset. llvm-svn: 227307
*	AVX-512: Added FMA intrinsics with rounding mode	Elena Demikhovsky	2015-01-28	5	-137/+148
\| \| \| \| \| \| \| \| \|	By Asaf Badouh and Elena Demikhovsky Added special nodes for rounding: FMADD_RND, FMSUB_RND.. It will prevent merge between nodes with rounding and other standard nodes. llvm-svn: 227303
*	[X86] Teach disassembler to handle illegal immediates on AVX512 integer ↵	Craig Topper	2015-01-28	4	-6/+141
\| \| \| \| \| \|	compare instructions. llvm-svn: 227302
*	[X86] Merge printSSECC and printAVXCC. They only differed by an assertion.	Craig Topper	2015-01-28	5	-36/+10
\| \| \| \|	llvm-svn: 227301
*	[LPM] Rip all of ManagedStatic and ThreadLocal out of the pretty stack	Chandler Carruth	2015-01-28	1	-21/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	tracing code. Managed static was just insane overhead for this. We took memory fences and external function calls in every path that pushed a pretty stack frame. This includes a multitude of layers setting up and tearing down passes, the parser in Clang, everywhere. For the regression test suite or low-overhead JITs, this was contributing to really significant overhead. Even the LLVM ThreadLocal is really overkill here because it uses pthread_{set,get}_specific logic, and has careful code to both allocate and delete the thread local data. We don't actually want any of that, and this code in particular has problems coping with deallocation. What we want is a single TLS pointer that is valid to use during global construction and during global destruction, any time we want. That is exactly what every host compiler and OS we use has implemented for a long time, and what was standardized in C++11. Even though not all of our host compilers support the thread_local keyword, we can directly use the platform-specific keywords to get the minimal functionality needed. Provided this limited trial survives the build bots, I will move this to Compiler.h so it is more widely available as a light weight if limited alternative to the ThreadLocal class. Many thanks to David Majnemer for helping me think through the implications across platforms and craft the MSVC-compatible syntax. The end result is substantially faster. When running llc in a tight loop over a small IR file targeting the aarch64 backend, this improves its performance by over 10% for me. It also seems likely to fix the remaining regressions seen by JIT users with threading enabled. This may actually have more impact on real-world compile times due to the use of the pretty stack tracing utility throughout the rest of Clang or LLVM, but I've not collected any detailed measurements. llvm-svn: 227300
*	[LPM] A targeted but somewhat horrible fix to the legacy pass manager's	Chandler Carruth	2015-01-28	1	-14/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	querying of the pass registry. The pass manager relies on the static registry of PassInfo objects to perform all manner of its functionality. I don't understand why it does much of this. My very vague understanding is that this registry is touched both during static initialization and while each pass is being constructed. As a consequence it is hard to make accessing it not require a acquiring some lock. This lock ends up in the hot path of setting up, tearing down, and invaliditing analyses in the legacy pass manager. On most systems you can observe this as a non-trivial % of the time spent in 'ninja check-llvm'. However, I haven't really seen it be more than 1% in extreme cases of compiling more real-world software, including LTO. Unfortunately, some of the GPU JITs are seeing this taking essentially all of their time because they have very small IR running through a small pass pipeline very many times (at least, this is the vague understanding I have of it). This patch tries to minimize the cost of looking up PassInfo objects by leveraging the fact that the objects themselves are immutable and they are allocated separately on the heap and so don't have their address change. It also requires a change I made the last time I tried to debug this problem which removed the ability to de-register a pass from the registry. This patch creates a single access path to these objects inside the PMTopLevelManager which memoizes the result of querying the registry. This is somewhat gross as I don't really know if PMTopLevelManager is the right place to put it, and I dislike using a mutable member to memoize things, but it seems to work. For long-lived pass managers this should completely eliminate the cost of acquiring locks to look into the pass registry once the memoized cache is warm. For 'ninja check' I measured about 1.5% reduction in CPU time and in total time on a machine with 32 hardware threads. For normal compilation, I don't know how much this will help, sadly. We will still pay the cost while we populate the memoized cache. I don't think it will hurt though, and for LTO or compiles with many small functions it should still be a win. However, for tight loops around a pass manager with many passes and small modules, this will help tremendously. On the AArch64 backend I saw nearly 50% reductions in time to complete 2000 cycles of spinning up and tearing down the pipeline. Measurements from Owen of an actual long-lived pass manager show more along the lines of 10% improvements. Differential Revision: http://reviews.llvm.org/D7213 llvm-svn: 227299
*	Fold fcmp in cases where value is provably non-negative. By Arch Robison.	Elena Demikhovsky	2015-01-28	2	-0/+67
\| \| \| \| \| \| \| \| \| \| \| \|	This patch folds fcmp in some cases of interest in Julia. The patch adds a function CannotBeOrderedLessThanZero that returns true if a value is provably not less than zero. I.e. the function returns true if the value is provably -0, +0, positive, or a NaN. The patch extends InstructionSimplify.cpp to fold instances of fcmp where: - the predicate is olt or uge - the first operand is provably not less than zero - the second operand is zero The motivation for handling these cases optimizing away domain checks for sqrt in Julia for common idioms such as sqrt(xx+yy).. http://reviews.llvm.org/D6972 llvm-svn: 227298
*	[LPM] Stop using the string based preservation API. It is an	Chandler Carruth	2015-01-28	7	-17/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	abomination. For starters, this API is incredibly slow. In order to lookup the name of a pass it must take a memory fence to acquire a pointer to the managed static pass registry, and then potentially acquire locks while it consults this registry for information about what passes exist by that name. This stops the world of LLVMs in your process no matter how little they cared about the result. To make this more joyful, you'll note that we are preserving many passes which do not exist any more, or are not even analyses which one might wish to have be preserved. This means we do all the work only to say "nope" with no error to the user. String-based APIs are a bad idea. String-based APIs that cannot produce any meaningful error are an even worse idea. =/ I have a patch that simply removes this API completely, but I'm hesitant to commit it as I don't really want to perniciously break out-of-tree users of the old pass manager. I'd rather they just have to migrate to the new one at some point. If others disagree and would like me to kill it with fire, just say the word. =] llvm-svn: 227294
*	Migrate AArch64 except for TTI and AsmPrinter away from getSubtargetImpl.	Eric Christopher	2015-01-28	10	-48/+29
\| \| \| \|	llvm-svn: 227293
*	Add description to assert	David Blaikie	2015-01-28	1	-1/+1
\| \| \| \|	llvm-svn: 227291
*	PR22356: DebugInfo: Handle the size of a member where the type of that ↵	David Blaikie	2015-01-28	1	-5/+2
\| \| \| \| \| \|	member is a typedef (or other sugar) of a declaration. llvm-svn: 227290
*	Revert r227247 and r227228: "Add weak symbol support to RuntimeDyld".	Lang Hames	2015-01-28	3	-31/+1
\| \| \| \| \| \| \| \| \|	This has wider implications than I expected when I reviewed the patch: It can cause JIT crashes where clients have used the default value for AbortOnFailure during symbol lookup. I'm currently investigating alternative approaches and I hope to have this back in tree soon. llvm-svn: 227287
*	Move EH personality type classification to Analysis/LibCallSemantics.h	Reid Kleckner	2015-01-28	2	-28/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Also add enum types for __C_specific_handler and _CxxFrameHandler3 for which we know a few things. Reviewers: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7214 llvm-svn: 227284
*	Revert r227242 - Merge vector stores into wider vector stores (PR21711).	Quentin Colombet	2015-01-27	1	-54/+30
\| \| \| \| \| \| \|	This commit creates infinite loop in DAG combine for in the LLVM test-suite for aarch64 with mcpu=cylcone (just having neon may be enough to expose this). llvm-svn: 227272
*	[mips] Use __clear_cache builtin instead of cacheflush()	Petar Jovanovic	2015-01-27	1	-13/+2
\| \| \| \| \| \| \| \| \|	Use __clear_cache builtin instead of cacheflush() in Unix Memory::InvalidateInstructionCache(). Differential Revision: http://reviews.llvm.org/D7198 llvm-svn: 227269
*	SymbolRewriter: allow rewriting with comdats	Saleem Abdulrasool	2015-01-27	1	-0/+20
\| \| \| \| \| \| \| \| \|	COMDATs must be identically named to the symbol. When support for COMDATs was introduced, the symbol rewriter was not updated, resulting in rewriting failing for symbols which were placed into COMDATs. This corrects the behaviour and adds test cases for this. llvm-svn: 227261
*	SymbolRewriter: prevent unnecessary rewrite	Saleem Abdulrasool	2015-01-27	1	-0/+3
\| \| \| \| \| \| \|	The rewrite for the pattern based rewrite is unnecessary if the existing name matches the pattern. llvm-svn: 227260
*	remove function names from comments; NFC	Sanjay Patel	2015-01-27	1	-8/+6
\| \| \| \|	llvm-svn: 227256
*	Re-landing changes to use ArrayRef instead of SmallVectorImpl, and new API test.	Chris Bieneman	2015-01-27	1	-2/+1
\| \| \| \| \| \|	This contains the changes from r227148 & r227154, and also fixes to the test case to properly clean up the stack options. llvm-svn: 227255
*	[fuzzer] properly enable asan's coverage feedback	Kostya Serebryany	2015-01-27	1	-1/+4
\| \| \| \|	llvm-svn: 227254
*	fix typos; NFC	Sanjay Patel	2015-01-27	1	-4/+4
\| \| \| \|	llvm-svn: 227253
*	Add a Fuzzer library	Kostya Serebryany	2015-01-27	17	-0/+799
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: A simple genetic in-process coverage-guided fuzz testing library. I've used this fuzzer to test clang-format (it found 12+ bugs, thanks djasper@ for the fixes!) and it may also help us test other parts of LLVM. So why not keep it in the LLVM repository? I plan to add the cmake build rules later (in a separate patch, if that's ok) and also add a clang-format-fuzzer target. See README.txt for details. Test Plan: Tests will follow separately. Reviewers: djasper, chandlerc, rnk Reviewed By: rnk Subscribers: majnemer, ygribov, dblaikie, llvm-commits Differential Revision: http://reviews.llvm.org/D7184 llvm-svn: 227252
*	[SimplifyLibCalls] Don't confuse strcpy_chk for stpcpy_chk.	Ahmed Bougacha	2015-01-27	1	-10/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was introduced in a faulty refactoring (r225640, mea culpa): the tests weren't testing the return values, so, for both __strcpy_chk and __stpcpy_chk, we would return the end of the buffer (matching stpcpy) instead of the beginning (for strcpy). The root cause was the prefix "__" being ignored when comparing, which made us always pick LibFunc::stpcpy_chk. Pass the LibFunc::Func directly to avoid this kind of error. Also, make the testcases as explicit as possible to prevent this. The now-useful testcases expose another, entangled, stpcpy problem, with the further simplification. This was introduced in a refactoring (r225640) to match the original behavior. However, this leads to problems when successive simplifications generate several similar instructions, none of which are removed by the custom replaceAllUsesWith. For instance, InstCombine (the main user) doesn't erase the instruction in its custom RAUW. When trying to simplify say __stpcpy_chk: - first, an stpcpy is created (fortified simplifier), - second, a memcpy is created (normal simplifier), but the stpcpy call isn't removed. - third, InstCombine later revisits the instructions, and simplifies the first stpcpy to a memcpy. We now have two memcpys. llvm-svn: 227250
*	Teach IRCE to look at branch weights when recognizing range checks	Sanjoy Das	2015-01-27	1	-3/+14
\| \| \| \| \| \| \| \| \| \| \|	Splitting a loop to make range checks redundant is profitable only if the range check "never" fails. Make this fact a part of recognizing a range check -- a branch is a range check only if it is expected to pass (via branch_weights metadata). Differential Revision: http://reviews.llvm.org/D7192 llvm-svn: 227249
*	Revert "[x86] Combine x86mmx/i64 to v2i64 conversion to use scalar_to_vector"	Alexey Samsonov	2015-01-27	1	-29/+0
\| \| \| \| \| \|	This reverts commits r226953 and r226974. llvm-svn: 227248
*	dd the option, -link-opt-hints to llvm-objdump used with -macho to print the	Kevin Enderby	2015-01-27	1	-2/+25
\| \| \| \| \| \|	Mach-O AArch64 linker optimization hints for ADRP code optimization. llvm-svn: 227246
*	Merge vector stores into wider vector stores (PR21711)	Sanjay Patel	2015-01-27	1	-30/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch resolves part of PR21711 ( http://llvm.org/bugs/show_bug.cgi?id=21711 ). The 'f3' test case in that report presents a situation where we have two 128-bit stores extracted from a 256-bit source vector. Instead of producing this: vmovaps %xmm0, (%rdi) vextractf128 $1, %ymm0, 16(%rdi) This patch merges the 128-bit stores into a single 256-bit store: vmovups %ymm0, (%rdi) Differential Revision: http://reviews.llvm.org/D7208 llvm-svn: 227242
*	tsan: properly instrument unaligned accesses	Dmitry Vyukov	2015-01-27	1	-1/+22
\| \| \| \| \| \| \| \| \| \| \|	If a memory access is unaligned, emit __tsan_unaligned_read/write callbacks instead of __tsan_read/write. Required to change semantics of __tsan_unaligned_read/write to not do the user memory. But since they were unused (other than through __sanitizer_unaligned_load/store) this is fine. Fixes long standing issue 17: https://code.google.com/p/thread-sanitizer/issues/detail?id=17 llvm-svn: 227231
*	[ExecutionEngine] Add weak symbol support to RuntimeDyld	Keno Fischer	2015-01-27	3	-1/+31
\| \| \| \| \| \| \| \| \| \| \|	Support weak symbols by first looking up if there is an externally visible symbol we can find, and only if that fails using the one in the object file we're loading. Reviewed By: lhames Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D6950 llvm-svn: 227228
*	[ExecutionEngine] FindFunctionNamed: Skip declarations	Keno Fischer	2015-01-27	2	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Basically all other methods that look up functions by name skip them if they are mere declarations. Do the same in FindFunctionNamed. Reviewers: lhames Reviewed By: lhames Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7068 llvm-svn: 227227
*	[mips] Add range checks and transformation to octeon instructions in AsmParser.	Kai Nacke	2015-01-27	1	-0/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds range checks to the immediate operands of octeon instructions in the AsmParser. Like gas, it applies the following transformations if the immediate is to large: bbit0 $8, 42, foo => bbit032 $8, 10, foo bbit1 $8, 46, foo => bbit132 $8, 14, foo cins $8, $31, 32, 31 => cins32 $8, $31, 0, 31 exts $7, $4, 54, 9 => exts32 $7, $4, 22, 9 Reviewed By: dsanders Differential Revision: http://reviews.llvm.org/D7080 llvm-svn: 227225
*	R600/SI: Fix MIN3/MAX3 on VI, define MED3	Marek Olsak	2015-01-27	1	-9/+16
\| \| \| \|	llvm-svn: 227213
*	R600/SI: Don't set patterns for chip-specific instructions while having pseudos	Marek Olsak	2015-01-27	1	-50/+43
\| \| \| \| \| \| \| \| \| \| \|	Only pseudos have patterns on them. Also don't set the asm string for VINTRP_Pseudo. All pseudos should have empty asm. This matches what all other multiclasses do. llvm-svn: 227212
*	R600/SI: Add VI versions of LDS atomics	Marek Olsak	2015-01-27	3	-120/+139
\| \| \| \| \| \| \|	Each class is split into two: one adds let statements around non-pseudos, and the other one specifies the parameters. llvm-svn: 227211
*	R600/SI: Add VI versions of MUBUF atomics	Marek Olsak	2015-01-27	2	-73/+80
\| \| \| \|	llvm-svn: 227210