| Commit message (Collapse) | Author | Age | Files | Lines | 
| ... |  | 
| | 
| 
| 
|  | 
llvm-svn: 221151
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
When LLVM emits DWARF call frame information, it currently creates a local,
section-relative symbol in the code section, which is pointed to by a
relocation on the .eh_frame section. However, for C++ we emit some functions in
section groups, and the SysV ABI has some rules to make it easier to remove
these sections
(http://www.sco.com/developers/gabi/latest/ch4.sheader.html#section_group_rules):
  A symbol table entry with STB_LOCAL binding that is defined relative to one
  of a group's sections, and that is contained in a symbol table section that is
  not part of the group, must be discarded if the group members are discarded.
  References to this symbol table entry from outside the group are not allowed.
This means that we need to use the function symbol for the relocation, not a
temporary symbol.
There was a comment in the code claiming that the local symbol was used to
avoid creating a relocation, but a relocation must be created anyway as the
code and CFI are in different sections.
llvm-svn: 221150
 | 
| | 
| 
| 
| 
| 
|  | 
DwarfCompileUnit.
llvm-svn: 221123
 | 
| | 
| 
| 
|  | 
llvm-svn: 221095
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Currently we only need to emit skeleton strings into the CU header and
we do this by explicitly calling "addLocalString". With gmlt-in-fission,
we'll be emitting a bunch of other strings from other codepaths where
it's not statically known that these strings will be local or not.
Introduce a virtual function to indicate whether this unit is a DWO unit
or not (I'm not sure if we have a good term for this, the
opposite/alternative to 'skeleton' unit) and use that to generalize the
string emission logic so that strings can be correctly emitted in both
the skeleton and dwo unit when in split dwarf mode.
And to demonstrate that this works, switch the existing special callers
of addLocalString in the skeleton builder to addString - and they still
work. Yay.
llvm-svn: 221094
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
DwarfCompileUnit
This is a useful distinction/invariant/delination to make because
LineTablesOnly mode is never relevant to type units, so it's clear that
we're not doing weird line-tables-only-with-types by making this API
choice.
It also lays the foundations nicely for adding gmlt-like data to fission
skeleton CUs while limiting the effects to CUs and not TUs.
llvm-svn: 221093
 | 
| | 
| 
| 
|  | 
llvm-svn: 221092
 | 
| | 
| 
| 
|  | 
llvm-svn: 221090
 | 
| | 
| 
| 
|  | 
llvm-svn: 221089
 | 
| | 
| 
| 
|  | 
llvm-svn: 221088
 | 
| | 
| 
| 
|  | 
llvm-svn: 221087
 | 
| | 
| 
| 
|  | 
llvm-svn: 221086
 | 
| | 
| 
| 
|  | 
llvm-svn: 221085
 | 
| | 
| 
| 
| 
| 
|  | 
don't have variables
llvm-svn: 221084
 | 
| | 
| 
| 
| 
| 
|  | 
DwarfCompileUnit
llvm-svn: 221083
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
(these will shortly become virtual, with a null implementation in
DwarfUnit (since type units don't have accelerator tables in the current
schema) and the current implementation down in DwarfCompileUnit, moving
the actual maps there too)
llvm-svn: 221082
 | 
| | 
| 
| 
| 
| 
|  | 
accessor
llvm-svn: 221080
 | 
| | 
| 
| 
| 
| 
|  | 
skipped in initSection
llvm-svn: 221079
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This would help catch cases where we might otherwise try to reference a
dwo CU label, which would be weird - because without relocations in the
dwo file it's not generally meaningful to talk about the CU offsets
there (or, if it is, we can do so in absolute terms without using a
relocation to compute it).
llvm-svn: 221078
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
skeleton explicitly.
Confusing to do this two different ways - I'm not too wedded to either
one, but here goes.
llvm-svn: 221076
 | 
| | 
| 
| 
| 
| 
|  | 
place it's needed.
llvm-svn: 221075
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
This allows the CU label to be emitted only for compile units, as
they're the only ones that need it (so they can be referenced from
pubnames)
llvm-svn: 221072
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
directly
This was a compile-unit specific label (unused in type units) and seems
unnecessary anyway when we can more easily directly compute the size of
the compile unit.
llvm-svn: 221067
 | 
| | 
| 
| 
|  | 
llvm-svn: 221062
 | 
| | 
| 
| 
| 
| 
|  | 
instead of DwarfUnit*) now that it's specific to DwarfCompileUnit anyway.
llvm-svn: 221060
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
Type units no longer have skeletons and it's misleading to be able to
query for a type unit's skeleton (it might incorrectly lead one to
conclude that if a unit doesn't have a skeleton it's not in a .dwo
file... ).
llvm-svn: 221055
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This is the first big step to allowing gmlt-like inline scope
information in the skeleton CU. While this commit doesn't change the
functionality, it's only a small step to call
"constructAbstractSubprogramDIE" on both the InfoHolder and the
SkeletonHolder (when in use) and that will at least create the abstract
SP dies in that case, though still not creating the other subprograms.
llvm-svn: 221051
 | 
| | 
| 
| 
|  | 
llvm-svn: 221037
 | 
| | 
| 
| 
|  | 
llvm-svn: 221036
 | 
| | 
| 
| 
| 
| 
|  | 
the list of all units.
llvm-svn: 221034
 | 
| | 
| 
| 
|  | 
llvm-svn: 221033
 | 
| | 
| 
| 
| 
| 
|  | 
having to cast from DwarfUnit* on every call.
llvm-svn: 221031
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Change `Instruction::getMetadata()` to return `Value` as part of
PR21433.
Update most callers to use `Instruction::getMDNode()`, which wraps the
result in a `cast_or_null<MDNode>`.
llvm-svn: 221024
 | 
| | 
| 
| 
|  | 
llvm-svn: 221010
 | 
| | 
| 
| 
| 
| 
|  | 
DwarfCompileUnit
llvm-svn: 221005
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This patch adds an optimization in CodeGenPrepare to move an extractelement
right before a store when the target can combine them.
The optimization may promote any scalar operations to vector operations in the
way to make that possible.
** Context **
Some targets use different register files for both vector and scalar operations.
This means that transitioning from one domain to another may incur copy from one
register file to another. These copies are not coalescable and may be expensive.
For example, according to the scheduling model, on cortex-A8 a vector to GPR
move is 20 cycles.
** Motivating Example **
Let us consider an example:
define void @foo(<2 x i32>* %addr1, i32* %dest) {
 %in1 = load <2 x i32>* %addr1, align 8
 %extract = extractelement <2 x i32> %in1, i32 1
 %out = or i32 %extract, 1
 store i32 %out, i32* %dest, align 4
 ret void
}
As it is, this IR generates the following assembly on armv7:
  vldr  d16, [r0]            @vector load  
  vmov.32 r0, d16[1]  @ cross-register-file copy: 20 cycles
  orr r0, r0, #1           @ scalar bitwise or
  str r0, [r1]               @ scalar store
  bx  lr
Whereas we could generate much faster code:
  vldr  d16, [r0]               @ vector load
  vorr.i32  d16, #0x1     @ vector bitwise or
  vst1.32 {d16[1]}, [r1:32] @ vector extract + store
  bx  lr
Half of the computation made in the vector is useless, but this allows to get
rid of the expensive cross-register-file copy.
** Proposed Solution **
To avoid this cross-register-copy penalty, we promote the scalar operations to
vector operations. The penalty will be removed if we manage to promote the whole
chain of computation in the vector domain.
Currently, we do that only when the chain of computation ends by a store and the
target is able to combine an extract with a store.
Stores are the most likely candidates, because other instructions produce values
that would need to be promoted and so, extracted as some point[1]. Moreover,
this is customary that targets feature stores that perform a vector extract (see
AArch64 and X86 for instance).
The proposed implementation relies on the TargetTransformInfo to decide whether
or not it is beneficial to promote a chain of computation in the vector domain.
Unfortunately, this interface is rather inaccurate for this level of details and
although this optimization may be beneficial for X86 and AArch64, the inaccuracy
will lead to the optimization being too aggressive.
Basically in TargetTransformInfo, everything that is legal has a cost of 1,
whereas, even if a vector type is legal, usually a vector operation is slightly
more expensive than its scalar counterpart. That will lead to too many
promotions that may not be counter balanced by the saving of the
cross-register-file copy. For instance, on AArch64 this penalty is just 4
cycles.
For now, the optimization is just enabled for ARM prior than v8, since those
processors have a larger penalty on cross-register-file copies, and the scope is
limited to basic blocks. Because of these two factors, we limit the effects of
the inaccuracy. Indeed, I did not want to build up a fancy cost model with block
frequency and everything on top of that.
[1] We can imagine targets that can combine an extractelement with  other
instructions than just stores. If we want to go into that direction, the current
interfaces must be augmented and, moreover, I think this becomes a global isel
problem.
Differential Revision: http://reviews.llvm.org/D5921
<rdar://problem/14170854>
llvm-svn: 220978
 | 
| | 
| 
| 
| 
| 
|  | 
Noticed in post-commit review by Adrian Prantl.
llvm-svn: 220967
 | 
| | 
| 
| 
| 
| 
|  | 
Initial patch by Oleg Ranevskyy.
llvm-svn: 220945
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
r212242 introduced a legalizer hook, originally to let AArch64 widen
v1i{32,16,8} rather than scalarize, because the legalizer expected, when
scalarizing the result of a conversion operation, to already have
scalarized the operands.  On AArch64, v1i64 is legal, so that commit
ensured operations such as v1i32 = trunc v1i64 wouldn't assert.
It did that by choosing to widen v1 types whenever possible.  However,
v1i1 types, for which there's no legal widened type, would still trigger
the assert.
This commit fixes that, by only scalarizing a trunc's result when the
operand has already been scalarized, and introducing an extract_elt
otherwise.  
This is similar to r205625.
Fixes PR20777.
llvm-svn: 220937
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Earlier this summer I fixed an issue where we were incorrectly combining
multiple loads that had different constraints such alignment, invariance,
temporality, etc. Apparently in one case I made copt paste error and swapped
alignment and invariance.
Tests included.
rdar://18816719
llvm-svn: 220933
 | 
| | 
| 
| 
| 
| 
|  | 
when inlining two calls to the same function from the same call site.
llvm-svn: 220923
 | 
| | 
| 
| 
|  | 
llvm-svn: 220857
 | 
| | 
| 
| 
|  | 
llvm-svn: 220759
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
sets as keys into a cache of interference matrice values in the Interference
constraint adder.
Creating interference matrices was one of the large remaining time-sinks in
PBQP. Caching them reduces the total compile time (when using PBQP) on the
nightly test suite by ~10%.
llvm-svn: 220688
 | 
| | 
| 
| 
|  | 
llvm-svn: 220658
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
So that it has access to getOrCreateGlobalVariableDIE. If we ever support
decsribing using directive in C++ classes (thus requiring support in type
units), it will certainly use another mechanism anyway.
Differential Revision: http://reviews.llvm.org/D5975
llvm-svn: 220594
 | 
| | 
| 
| 
|  | 
llvm-svn: 220581
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
(part of refactoring to allow subprogram emission in both the skeleton
and main units to enable -gmlt-like data to be included in the skeleton
for live inlined backtracing purposes)
llvm-svn: 220578
 | 
| | 
| 
| 
| 
| 
| 
|  | 
It was only being used as a flag to identify the lack of debug info from
within endModule - use the section labels for that instead.
llvm-svn: 220575
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This is a first step for generating SSE rsqrt instructions for
reciprocal square root calcs when fast-math is allowed.
For now, be conservative and only enable this for AMD btver2
where performance improves significantly - for example, 29%
on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c
(if we convert the data type to single-precision float).
This patch adds a two constant version of the Newton-Raphson
refinement algorithm to DAGCombiner that can be selected by any target
via a parameter returned by getRsqrtEstimate()..
See PR20900 for more details:
http://llvm.org/bugs/show_bug.cgi?id=20900
Differential Revision: http://reviews.llvm.org/D5658
llvm-svn: 220570
 |