A proposal to add lambda

functions to the C++ standard

 

Valentin Samko

cxxpanel.valentin@samko.info

Project: ISO C++

Date: 2006-02-23

Document no: N1958=06-0028

1.          Introduction

A number of languages provide a way of passing code as arguments without having to define a separate named function [5]. For example, Algol 68 has downward funargs, Lisp has closures, in Smalltalk this is called “code blocks” which can be returned, passed as a parameter and executed later. Similar functionality is present in C# 2.0 (closures) and Java (anonymous classes). This concept is also not foreign to C++ as it extends the existing mechanisms and there are well known attempts (Boost.Lambda [9], Boost.Bind [8]) to introduce this functionality in C++ as a library solution.

Closures typically appear in languages that allow functions to be first-class values, in other words, such languages allow functions to be passed as arguments, returned from function calls, bound to variable names, etc., just like simpler types such as strings and integers.

Wikipedia

C++ has a concept of function objects which are the first class values, can be passed as arguments and returned, bound to variable names. Also, the well known Boost.Bind [8] library which was approved for TR1 was designed to bind parameters to function objects, creating new function objects. With the recent addition of normalised function pointers (tr1::function) this makes closures a missing logical extension of the C++ language, and in fact lambda functions ES017, ES062 [6] are in the list of active proposals of the C++ evolution group.

Many algorithms in the C++ Standard Library require the user to pass a predicate or any other functional object, and yet there is usually no simple way to construct such a predicate in place. Instead one is required to leave the current scope and declare a class outside of the function, breaking an important rule of declaring names as close as possible to the first use. This shows that lambda functions would add a great to the expressive power and ease of use of the C++ Standard Library.

We will refer to function objects which represent closures as lambda function objects, or lambda functions.

1.1.   Motivation

·          C++ Standard Library algorithms would be much more pleasant to use if C++ had support for lambdas. Lambda functions would let people use C++ Standard Library algorithms in many cases where currently it is easier to write a for loop. Many developers do not use function objects simply because of the syntactic overhead.

·          A small set of trivial lambda functions can also be created with tr1::bind, but this proposal introduces a much better syntax which one can easily use (tr1::bind syntax is too complicated in many cases). For example, having

struct A { int foo(int y = 1, bool b = false); };

A a;

compare

tr1::function<int(int, int)> f = boost::bind(

 &A::foo,

 &a,

 tr1::bind(&A::foo, &a, 1, false),

 tr1::bind(&A::foo, &a, _1, false) <

 tr1::bind(&A::foo, &a, _2, false)

 );

with

tr1::function<int(int, int)> f = int(int x, int y) {

 return a.foo(a.foo(), a.foo(x) < a.foo(y));

};

 

·          There is also a general understanding that closures are one of the most important missing C++ features, and there were many well known attempts to solve this problem on the library level (Boost.Bind [8], Boost.Lambda [9], Standard C++ library binders), but they introduce a very complicated and unreadable syntax, also leading to code which is very hard to debug and or analyse crash dumps. For example, in many environments developers are advised not to use these libraries as it takes significantly more time to analyse production problems one you have non trivial Boost.Bind or Boost.Lambda expressions in your call stack. Although the authors of these libraries did the best one can do in the current C++, and one can learn numerous highly non trivial programming techniques from these libraries, they introduce a new syntax for existing C++ constructs, making them very hard to read and understand. Another problem is that when one compiles source code which uses these libraries having inlining disabled, the performance is often severely affected. This again shows that lambda expressions are one of the most important missing features in C++ which can not be emulated in a reasonable way in a library.

·          If one defines a function in a header file and needs to create a predicate for use in that function, that predicate class currently has to be defined outside of that function and so it is visible in every translation that includes that header file, although that predicate is supposed to be internal to that function.

·          tr1::bind like solutions can not be used with many overloaded functions. For example, tr1::bind(&std::abs, _1) does not compile, and one has to write tr1::bind( (Type(*)(Type))&std::abs, _1)  where Type is the corresponding type name. Also, tr1::bind( &std::set<int>::find, _1, _2) does not compile for the same reason. This makes it very hard to use tr1::bind with the standard containers.

·          Another problem with tr1::bind (although this can be considered to be an implementation detail, but the most popular implementation Boost.Bind has this problem) is that all the parameters are passed to the created function object by reference and thus void foo(int) {} boost::bind(&foo, _1) (1); fails to compile, since 1 is not a const object of type int, and you can not pass it as a non const int reference either.

·          Most users will not use algorithms which require functional objects as long as one has to write more code to construct such function objects than one would write to “embed” that algorithm into their code. This effectively means that many generic programming patterns will not enter the mainstream until C++ supports a simple and efficient way to construct such function objects inline, just like one can construct other expressions. For example, having

typedef std::set<int> myset;

typedef std::vector<int> myvec;

the following four examples implement the same functionality in the function foo using lambda functions proposed in this document, using C++ algorithm with a custom predicate, using C++ algorithm with a predicate composed with tr1::bind, and with the algorithm code embedded in the function. Note, the last one is less optimal than any reasonable standard library implementation, and an implementation with tr1::bind is non portable as it assumes that myset::find does not have any additional parameters with default values.

(1) lambda function

(3) custom code instead of the standard algorithm

void foo(

 myvec& v, const myset& s, int a) {

 // ...

 v.erase(

  std::remove_if(v.begin(), v.end(),

   bool(int x) {

     return std::abs(x) < a

       && s.find(x) != s.end(); }),

  v.end()

  );

}

void foo(

 myvec& v, const myset& s, int a) {

 // ...

 myvec::iterator new_end = v.begin();

 for (myvec::iterator i = v.begin();

  i != v.end();

  ++i) {

  if ( ! (std::abs(*i) < a

    && s.find(*i) != s.end()) )

   *(new_end++) = *i;

  }

 v.erase(new_end, v.end());

}

(2) custom predicate

(4) tr1::bind

struct MyPredicate {

 MyPredicate(const myset& s, int a)

  : s_(s), a_(a) {}

 bool operator()(int x) const {

  return std::abs(x) < a_

    && s_.find(x) != s_.end();

 }

private:

 const myset& s_;

 int a_;

};

 

void foo(

 myvec& v, const myset& s, int a) {

 // ...

 v.erase(

  std::remove_if(v.begin(), v.end(),

   MyPredicate(s, a)),

  v.end()

  );

}

void foo(

 myvec& v, const myset& s, int a) {

 // ...

 v.erase(

  std::remove_if(

   v.begin(),

   v.end(),

   tr1::bind(

    std::logical_and<bool>(),

    (tr1::bind(

     (int(*)(int))&std::abs, _1) < a),

    (tr1::bind(

      (myset::const_iterator(myset::*)(const int&)const)&myset::find,

       &s, _1) != s.end())

    )

   ),

  v.end()

  );

}

In addition to much more readable code (which example do you prefer to read to understand what foo does?), this example shows that with the current C++ standard the only sensible options are (2) and (3), and many developers pick (3) because it is shorter, and one does not need to define a separate structure which is visible to everyone else. Still, option (2) is generally faster than (3) as mentioned above. We note that if lambda functions are supported natively by the language, option (1) would be the most simple and brief as seen from this example. The size of the predicate functional object is also likely to be smaller with lambda functions as compiler may optimise away the extra reference and only store one pointer to the relevant stack frame in the predicate class.

·          With lambda functionality described in this proposal we do not need a set of existing proposals, such as enhanced bindings [7], mem_fn adaptor [10],  callable pointers to members [11].

·          C++ algorithms are usually more optimal than a user implementation of the same functionality embedded into other functions, as the standard implementation of these algorithms may contain non trivial optimisations and be tailored for specific C++ standard library data structures. Because of the absence of a simple way to construct necessary predicates inline, many users miss these optimisations.

·          One can not use tr1::bind in a portable manner with functions in the std namespace and member functions of standard C++ library containers as the implementation is allowed to add extra parameters with default arguments to these functions. There would be no such problem if C++ had a native support for lambda functions. The same problem exists with any 3rd party library when vendor adds a new argument with a default value to a function which is used in tr1::bind expressions by a client.

·          Recently Scott Meyers raised a question at comp.lang.c++.moderated (see the thread “Manual Loops vs STL Algorithms in the Real World”) asking whether it's really reasonable to expect C++ programmers to use STL algorithms instead of writing their own loops. It was stated that the real life code is usually less trivial than the common examples of using standard library binders, and function objects overcomplicate the code. This problem would be solved by lambda functions proposed in this document as lambda functions introduce a very short and convenient notation to define function objects inline and pass them to any algorithms.

·          A short but a very important note in favour of native support for lambda functions is that with lambda functions the overall quality of C++ code would improve with the new code containing less bugs which very often occur when programmers “embed” algorithms into their code. I.e. lambda functions would give a much needed boost to the reuse of generic algorithms.

1.2.   Required functionality

·          Lambda functions must be the first class C++ citizens, i.e. objects. These objects must have class types, and these classes must have the "result_type, arg1_type, ..." typedefs and the corresponding operator().

·          Lambda objects must be copyable. When a lambda object is copied, all the objects and references bound to that lambda object must be copied.

·          A lambda must have the same access to the names and entities outside of the lambda definition as the scope enclosing that lambda definition. There is already a similar case for local classes (9.8/1) where declarations in a local class can use type names, enum values, extern variables and static and global variables from the enclosing scope.

·          Lambdas must be significantly easier to read, write and debug than any possible library solutions.

·          Lambdas must be compatible with the proposed tr1::bind and tr1::function libraries, i.e. one should be able to create tr1::function objects from lambda objects and bind parameters to lambda objects with tr1::bind.

1.3.   Definitions

            Lambda objectA function object of an implementation dependent class type with operator() and result_type, etc. typedefs which can be created by a primary expression.

            Declaration of a lambda object – A primary expression which creates a temporary lambda function object.

            Definition of a lambda object – A declaration of a primary object is also a definition.

            Lambda body – A code block which can be executed, given a set of parameters and context.

            Local lambda – Lambda definition in a function, or in a lambda body.

            Lambda – An expression which contains code and refers to certain variables, which is not evaluated immediately, and can be passed to functions which will evaluate it when needed, or stored to be evaluated in the future.

            Lambda bound values initialiser – Declaration and definition of variables which are contained in a lambda object, copy constructed when the lambda object is constructed or copy constructed. They can be used to bind any values to the lambda function by value.

1.4.   Lambda objects and normalised function pointer type

1.       We define normalised function pointers as normalised types for any expressions which can be called with a function call syntax. A main requirement for such normalised function pointers is that  all the  normalised function pointers which return objects of the same type and accept parameters of the same type do have the same type and can be copied and assigned and it does not matter whether they refer to plain function pointers, or to complicated function objects. tr1::function is an example of a library solution for such normalised function pointers. Some languages support normalised function pointers directly.

2.       Lambda functions are not required to be normalised function pointers. Lambda objects created by different code are not required to be assignable or to have the same layout, even if they return values of the same type and their arguments have same types. One can always normalise any lambda object using the library solutions like tr1::function. Even if normalised function pointers are added to the C++ standard, they will not affect lambda functions in any way since lambda functions are orthogonal to normalised function pointers (in the same way as the result of a tr1::bind expression is orthogonal to tr1::function).

3.       The fact that lambda objects are function objects and not function pointers is consistent with existing C++ practices as we already have a notion of function objects, which are objects classes with operator(). Also, normalised function pointers would have the same limitation that any other function pointers have, for example function calls through such pointers can rarely be inlined.

4.       Unlike function pointers, lambda objects are not comparable with 0 and do not have operator ! as they are always valid, one can not have an uninitialised lambda object.

1.5.   Proposed syntax

  ret_type(type1 value1, type2 value2, ...) { statements }

 

All the names and entities visible and accessible in the scope where lambda is declared must be also visible and accessible in the body of the lambda object. For example

void f(int x) {

 std::vector<int> v;

 // ...

 std::remove_if(v.begin(), v.end(), void (int& n) { n < x; });

}

 

Optionally, one can bind values to lambda objects which will have the same life time as the lambda object by using the lambda bound values initialiser, such as (typea boundvalue1(x), typeb boundvalue2(y), ...). This initialiser can be used to pass variables to the lambda by value, as by default nothing is passed to the lambda, all the variables in the enclosing scope are automatically visible in the lambda function body and it is undefined behaviour if lambda refers to variables whose storage has been released. The complete lambda definition with bound values initialiser is

ret_type (type1 value1, type2 value2, ...)

 : (typea boundvalue1(x), typeb boundvalue2(y), ...)

{ statements }

 

For example

void set_callback(tr1::function<bool(int)>);

void foo(int t) {

 set_callback(bool(int i) : (int number(t)) { std::cout<<(i + number); });

}

 

A lambda definition is a primary expression and so it can be used anywhere where any other primary expression can be used. For example,

int x = 0;

struct A {

 tr1::function<void()> f;

 A() : f(void(){ ++x; }) {}

 void foo(tr1::function<bool()> p = bool(){return x<0;}) { p(); }

 static tr1::function<int(bool)> f2;

 void bar() { throw void() { --x; }; } // can only be caught in (...)

};

tr1::function<int(bool)> A::f2 = int(bool b) { return b ? 1 : 2; };

2.          Specification

2.1.   Lambda definition

A lambda definition defines an implementation dependant lambda class with external linkage and a temporary object of that class (12.2). Lambda definitions are primary expressions and have the following grammar

  lambda-definition:

    type-specifier lambda-declarator lambda-bound-values-declaratoropt { lambda-body }

 

  lambda-declarator:

    ( parameter-declaration-clause )

 

  lambda-bound-values-declarator:

    : ( lambda-bound-values-list )

 

  lambda-body:

    compound-statement

 

  lambda-bound-values-list:

    lambda-bound-value

    lambda-bound-values-list, lambda-bound-value

 

  lambda-bound-value:

    decl-specifier-seq declarator(assignment-expression)

    decl-specifier-seq abstract-declarator(assignment-expression)

 

where parameters in the parameter-declaration-clause are not visible in assignment-expression elements in the lambda-bound-values-declarator.

2.2.   Lambda classes

1.       Lambda objects have class types, and expressions

·          ret_type (type1 param1, type2 param2, ...) { statements }

·          ret_type (type1 param1, type2 param2, ...) : (type3 boundvalue1(v1), type4 boundvalue2(v2), ...) { statements }

create temporary lambda objects of  classes with implementation dependent names, where each such expression can possibly result in a different class. The fact that the type of lambda classes is unspecified is consistent with the proposal for an enhanced binder [7] which was approved for TR1, where the type of the returned function objects is also unspecified.

2.       If a lambda is defined within a function template, or a member function of a class template, it is essential that every unique instantiation of the template yields a unique type for the lambda class, just as every unique instantiation of the template yields a unique address for the function in which the local class is defined. If a local class is defined within a function which is itself a template, or is a member of a template, then instantiations of the template with the same set of template parameters must yield the same instance of the local class, even if the instantiations are performed in different translation units.

3.       Lambda classes may have any implementation dependent names, and classes representing different lambda objects are allowed to have different names even if they return values of the same type and have the same set of parameters. Implementation must guarantee that lambda class names do not clash with any user defined names and the observable behaviour does not depend on names of lambda classes.

4.       Lambda classes must have the public "result_type, arg1_type, ..." typedefs and public operator() const without exception specification. When operator() is called, it must invoke the lambda body. Also, no “this” pointer is declared in the lambda body, but if another “this” is visible in the scope where the lambda is declared, that “this” is visible in the lambda body, the same stands for the operator(), i.e. the lambda body can not recursively call itself directly.

5.       The implementation is required to define a public copy constructor, and not to define operator =.

6.       The implementation is required to define a public default constructor if and only if the body of the corresponding lambda definition does not refer to anything except global, static and extern variables and enums and does not contain any bound values.

7.       sizeof(lambda_type) is implementation dependent and is allowed to be different for lambda classes defined by different lambda definitions.

8.       Implementation is allowed to add any member variables to the lambda class.

9.       Declarations in the lambda body can use any type names, variables, functions and enumerators from the enclosing scope.

10.   Types of all the lambda parameters and bound values must have external linkage.

2.3.   Lambda behaviour

We follow the approach used to introduce unnamed namespaces (7.3.1.1) and define lambda behaviour in terms of the behaviour of existing language constructs in the current standard.

Lambda definition behaves as if

1.       Lambda definition is replaced by a simple type specifier of a class followed by a parenthesized expression list (5.2.3) where that class has a globally unique name is defined prior to the declaration of the function the lambda definition is defined in.

2.       All the enclosing scope variables visible at the point of lambda definition which names are used in the lambda body are passed to the constructor of that class, as well as all the bound values.

3.       Constructor of that class accepts the local scope variables by reference, and bound values by types specified in the lambda bound values initialiser.

4.       That class has members variables with types and names specified in the lambda bound values initialiser, and they all are initialised in the member initialiser of the class constructor.

5.       For every local scope variable accessed from the lambda body, that class has a member variable which has the type of a reference to the original variable, or of a reference to the type that variable refers to if that variable is a reference itself. All these references are initialised in the initializer list of the class constructor.

For example, having

std::string s = “test”;

the observable behaviour of

tr1::function<int(std::string, bool)> f =

 int(std::string x, bool b) { std::cout << s << x ; } ;

the same as of

class unique_class_name {

public:

 typedef result_type int;

 typedef arg1_type std::string;

 typedef arg2_type bool;

 unique_class_name(std::string& s_) : s(s_) {}

 int operator()(std::string x, bool b) const { std::cout << s << x; }

private:

 std::string& s;

};

tr1::function<int(std::string, bool)> f = unique_class_name(s);

and having

bool b = false;

std::string t = “test”;

the observable behaviour of

tr1::function<double(int, const std::string&)> f =

 double(int i, const std::string& s)

  : (bool f(b), std::string p(“test”))

 { std::cout << s << p << f << i << t; }

is the same as of

class unique_class_name {

public:

 unique_class_name(std::string t_)

  : t(t_), f(b), p(“test”) {}

 typedef result_type int;

 typedef arg1_type std::string;

 typedef arg2_type bool;

 double operator()(int i, const std::string& s) const

 { std::cout << s << p << f << i << t; }

private:

 bool f;

 std::string p;

 std::string& t;

};

tr1::function<double(int, const std::string&)> f = unique_class_name(b, t);

2.4.   Linkage

1.       Lambda classes should have the linkage of the enclosing function (or of the class if the lambda is defined in a class constructor or destructor, or in the member initialiser in the constructor), or of the variable being initialised if lambda class is defined in a simple-declaration in the namespace scope.

2.       If a lambda is defined in a function, all the names defined in that function are visible in the that class with the globally unique name.

2.5.   Access to names and entities from inside the lambda body

1.  The lambda body and the lambda bound values initialiser must have the same access to the names outside of the lambda definition as the scope enclosing that lambda definition.

2.  All the entities visible in the lambda body may actually be references to the original variables, but this must not be observable in the lambda  body.

3.  It is legal for lambda objects to refer to  local references even if lambda objects outlive these references but not the referenced objects.

4.  Name visibility in a lambda should be the same as if one had a function object class (in a unique namespace) with external linkage, which had members with the same names and types (references) as all the variables defined in the local scope and accessible from the point where lambda was declared, including function parameters. Whether types of these "member variables" are references or not is implementation dependent.

5.  It is undefined behaviour if there is a valid lambda object which body refers to a name of an object which storage is released.

2.6.   Where lambda expressions can be used

Anywhere where a primary expressions may be used. For example,

1.       In function bodies (function-body) of functions

2.       In an assignment-expression of function or operator parameter-declaration-clause

3.       In variable declarations (simple-declaration) in global and namespace scopes.

(1) and (2) are implementable for functions with external linkage. In case of functions with internal linkage, this may be done, as this is implemented in VC++ compilers (v7, v8). In case of (3), if the variable being declared has internal linkage or is in an unnamed namespace, the implementation must create a different lambda-object in every translation unit. For example,

int x;

namespace { int y; }

tr1::function<void()> f = void() { x += y; };

is equivalent to

namespace { lambda_class_name lambda_object; }

tr1::function<void()> f = lambda_object;

 

This would also work with proposal for type deduction N1894 [1]. For example one could write

int x;

namespace { int y; }

auto f = void() { x += y };

 

This approach would also result in correct code if a lambda object is used to initialise a variable in an unnamed namespace. For example

int x;

namespace { int y; }

namespace { tr1::function<void()> f = void() { x += y; }; }

2.7.   Impact on existing code

This proposal is only an extension and there is no impact on existing code.

2.8.   Examples

template<class T> void inherit(T t) {

  struct X : public T {

    X() {} // ill-formed, classes generated by lambda definitions which refer to local variables do not have default constructors