diff --git a/proposals/p5448.md b/proposals/p5448.md new file mode 100644 index 0000000000000..4835e9a1bf0ee --- /dev/null +++ b/proposals/p5448.md @@ -0,0 +1,386 @@ +# Carbon <-> C++ Interop: Primitive Types + + + +[Pull request](https://github.com/carbon-language/carbon-lang/pull/5448) + + + +## Table of contents + +- [Abstract](#abstract) +- [Problem](#problem) +- [Background](#background) + - [Carbon Primitive Types](#carbon-primitive-types) + - [C++ Fundamental Types](#c-fundamental-types) + - [void](#void) + - [std::nullptr_t](#stdnullptr_t) + - [std::byte](#stdbyte) + - [Character types](#character-types) + - [Signed integer types](#signed-integer-types) + - [Unsigned integer types](#unsigned-integer-types) + - [Floating-point types](#floating-point-types) + - [Data models](#data-models) +- [Proposal](#proposal) +- [Details](#details) + - [C++ -> Carbon mapping](#c---carbon-mapping) + - [Carbon -> C++ mapping](#carbon---c-mapping) +- [Rationale](#rationale) +- [Alternatives considered](#alternatives-considered) +- [Open questions](#open-questions) + + + +## Abstract + +Define the type mapping of the primitive types between Carbon and C++. + +## Problem + +Interoperability of Carbon with C++ is one of the Carbon language goals (see +[Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code)). +Providing +[unsurprising mappings between C++ and Carbon types](/docs/design/interoperability/philosophy_and_goals.md#unsurprising-mappings-between-c-and-carbon-types) +is one of it's sub goals. + +This proposal addresses the type mapping between the two languages to support +achieving this goal. + +## Background + +### Carbon Primitive Types + +Carbon has the following +[primitive types](https://github.com/carbon-language/carbon-lang/blob/trunk/docs/design/README.md#primitive-types): + +- `bool`: boolean type taking `true` or `false` +- integer types: + - signed integer types: `iN` (`N` - bit width, a positive multiple of 8) + - `i8`, `i16`, `i32`, `i64`, `i128`, `i256` + - unsigned integer types: `uN` (`N` - bit width, a positive multiple of 8) + - `u8`, `u16`, `u32`, `u64`, `u128`, `u256` +- floating-point types: `fN` (`N` - bit width, a positive multiple of 8), + IEEE-754 format + - `f16`, `f32`, and `f64` - always available + - `f80`, `f128`, or `f256` may be available, depending on the platform + +### C++ Fundamental Types + +C++ calls the primitive types +[fundamental types](https://en.cppreference.com/w/cpp/language/types). The +following fundamental types exist in C++: + +- `void` +- `std::nullptr_t` +- `std::byte` +- integral types (also integer types): + - `bool` + - character types: + - narrow character types: `signed char`, `unsigned char`, `char`, + `char8_t` (c++20) + - wide character types: `char16_t`, `char32_t`, `wchar_t` + - signed integer types: + - standard signed integer types: `signed char`, `short`, `int`, + `long`, `long long` + - extended signed integer types (implementation-defined) + - unsigned integer types: + - standard unsigned integer types: `unsigned char`, `unsigned short`, + `unsigned int`, `unsigned long`, `unsigned long long` + - extended unsigned integer types +- floating-point types: + - standard floating-point types: `float`, `double`, `long double` + - extended floating-point types: + - fixed width floating-point types (since C++23): `float16_t`, + `float32_t`, `float64_t`, `float128_t`, `bfloat16_t` + - other implementation-defined extended floating-point types + +#### void + +Objects of type `void` are not allowed, neither are arrays of `void`, nor +references to `void`. Pointers to `void` and functions returning `void` are +allowed. + +#### std::nullptr_t + +The type of `nullptr` (the null pointer literal). It's a distinct type that is +not itself a pointer type. + +#### std::byte + +| Type | Width in bits | Notes | +| ----------- | ------------- | ---------------------------------------------------------------------------------------------------------------------- | +| `std::byte` | 8-bit | can be used to access raw memory, same as `unsigned char`, but it's not a character type and is not an arithmetic type | + +#### Character types + +| Type | Width in bits | Notes | +| --------------- | ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| `char` | 8-bit | multibyte characters; same representation, alignment and signedness as either `signed char` or `unsigned char` (platform-dependent), but it's a distinct type | +| `signed char` | 8-bit | signed character representation | +| `unsigned char` | 8-bit | unsigned character representation; raw memory access | +| `char8_t` | 8-bit | UTF-8 character representation; same size, alignment and signedness as `unsigned char`, but a distinct type | +| `char16_t` | 16-bit | UTF-16 character representation; same size, alignment and signedness as `std::uint_least16_t`, but a distinct type | +| `char32_t` | 32-bit | UTF-32 character representation; same size, alignment and signedness as `std::uint_least32_t`, but a distinct type | +| `wchar_t` | 32-bit on Linux, 16-bit on Windows | wide character representation, holds UTF-32 on Linux and other non-Windows platforms, UTF-16 on Windows. | + +#### Signed integer types + +**Standard signed integer types** + +| Type | Width in bits | +| ------------- | ---------------------------------------------------------------------------------------------------- | +| `signed char` | 8-bit | +| `short` | 16-bit | +| `int` | 32-bit (except on [LP32](/proposals/p5448.md#data-models)- not Carbon supported, where it is 16-bit) | +| `long` | 32-bit (except on [LP64](/proposals/p5448.md#data-models), where it is 64-bit) | +| `long long` | 64-bit | + +**Exact-width integer types** + +Typically aliases of the standard integer types. + +| Type | Width in bits | Defined as | +| -------------- | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `std::int8_t` | 8-bit | `typedef signed char int8_t` | +| `std::int16_t` | 16-bit | `typedef signed short int16_t`. ([LP32](/proposals/p5448.md#data-models)- not Carbon supported: `typedef signed int int16_t`) | +| `std::int32_t` | 32-bit | `typedef signed int int32_t` ([LP32](/proposals/p5448.md#data-models)- not Carbon supported: `typedef signed long int32_t`) | +| `std::int64_t` | 64-bit | Depending on the platform, either `typedef signed long int64_t` (for example [LP64](/proposals/p5448.md#data-models)) or `typedef signed long long int64_t` (for example [LLP64](/proposals/p5448.md#data-models)) | + +**Fastest minimum-width integer types** + +Integer types that are usually fastest to operate with among all integer types +that have the minimum specified width. + +| Type | Width in bits | Defined as | +| ------------------- | ------------- | --------------------------------------------------------------------------------- | +| `std::int_fast8_t` | >=8-bit | `typedef signed char int_fast8_t` | +| `std::int_fast16_t` | >=16-bit | implementation dependent | +| `std::int_fast32_t` | >=32-bit | implementation dependent | +| `std::int_fast64_t` | >=64-bit | [LP64](/proposals/p5448.md#data-models): `typedef signed long int_fast64_t` | +| | | [LLP64](/proposals/p5448.md#data-models): `typedef signed long long int_fast64_t` | + +**Minimum-width integer types** + +Smallest signed integer type with width of at least N-bits. + +| Type | Width in bits | Defined as | +| -------------------- | ------------- | ------------------------------------------------------------------------------------------------------------------------------------ | +| `std::int_least8_t` | >=8-bit | `typedef signed char int_least8_t` | +| `std::int_least16_t` | >=16-bit | `typedef short int_least16_t` ([LP32](/proposals/p5448.md#data-models)- not Carbon supported: `typedef signed int int_least16_t`) | +| `std::int_least32_t` | >=32-bit | `typedef int int_least32_t` ([LP32](/proposals/p5448.md#data-models)- not Carbon supported: `typedef signed long int int_least32_t`) | +| `std::int_least64_t` | >=64-bit | [LP64](/proposals/p5448.md#data-models): `typedef signed long int_least64_t` | +| | | [LLP64](/proposals/p5448.md#data-models): `typedef signed long long int_least64_t` | + +**Greatest-width integer types** + +Maximum-width signed integer type. + +| Type | Width in bits | Defined as | +| --------------- | ------------- | ----------------------------------------------------------------------------- | +| `std::intmax_t` | >=32-bit | [LLP64](/proposals/p5448.md#data-models): `typedef signed long long intmax_t` | +| | | [LP64](/proposals/p5448.md#data-models): `typedef signed long intmax_t` | + +**Integer types capable of holding object pointers** + +Signed integer type, capable of holding any pointer. + +| Type | Width in bits | Defined as | +| --------------- | ------------- | ----------------------------------------------------------------------------------------------------------------- | +| `std::intptr_t` | >=16-bit | `typedef long intptr_t` on most platforms | +| | | `typedef int intptr_t` on some [ILP32](/proposals/p5448.md#data-models); [LLP64](/proposals/p5448.md#data-models) | + +**Other signed integer types** + +| Type | Width in bits | Defined as | +| ----------- | ------------- | ----------------------------------------------------- | +| `ptrdiff_t` | >=16-bit | `typedef std::intptr_t ptrdiff_t` (on most platforms) | +| | | Holds the result of subtracting two pointers. | + +#### Unsigned integer types + +The unsigned integer types have the same sizes as their +[signed counterparts](/proposals/p5448.md#signed-integer-types). + +| Type | Width in bits | Defined as | +| -------- | ------------- | ---------------------------------------------- | +| `size_t` | >=16-bit | `typedef uintptr_t size_t` (on most platforms) | +| | | Holds the result of the `sizeof` operator. | + +#### Floating-point types + +**Standard floating-point types** + +| Type | Format | Width in bits | Note | +| ------------- | --------------------------------- | ---------------- | --------------------------------------------------------------------------- | +| `float` | usually IEEE-754 binary32 | 32-bits | The format or the size can vary depending on the compiler and the platform. | +| `double` | usually IEEE-754 binary64 | 64-bits | The format or the size can vary depending on the compiler and the platform. | +| `long double` | IEEE-754 binary128 | 128-bit | used by some SPARC, MIPS, ARM64 implementations. | +| | IEEE-754 binary64-extended format | 80-bit or 64-bit | 80-bit (most x86 and x86-64 implementations); 64-bit used by MSVC. | +| | `double-double` | 128-bit | used on PowerPC. | + +**Fixed-width floating-point types (C++23)** + +They aren’t aliases to the standard floating-point types (`float`, `double`, +`long double`), but to an extended floating-point type. + +| Type | Width in bits | Defined as | +| ----------------- | ------------- | ------------------------------ | +| `std::float16_t` | 16-bit | `using float16_t = _Float16` | +| `std::float32_t` | 32-bit | `using float32_t = _Float32` | +| `std::float64_t` | 64-bit | `using float64_t = _Float64` | +| `std::float128_t` | 128-bit | `using float128_t = _Float128` | +| `std::bfloat16_t` | 16-bit | | + +### Data models + +The following data models are widely accepted: + +- 32-bit systems: + - `LP32` (Win16 API): `int` 16-bit; `long` 32-bit; `pointer` 32-bit. + - `ILP32` (Win32 API; Unix and Unix-like systems): `int` 32-bit; `long` + 32-bit; `pointer` 32-bit. +- 64-bit systems: + - `LLP64` (Win32 API: 64-bit ARM or x86-64): `int` 32-bit; `long` 32-bit; + `pointer` 64-bit. + - `LP64` (Unix and Unix-like systems (Linux, macOS)): `int` 32-bit; `long` + 64-bit; `pointer` 64-bit. + +[Carbon supported platforms](/docs/project/principles/success_criteria.md#modern-os-platforms-hardware-architectures-and-environments) + +Carbon will prioritize supporting modern OS, 64-bit little endian platforms (for +example [LLP64](/proposals/p5448.md#data-models), +[LP64](/proposals/p5448.md#data-models)). Historic platforms like +[LP32](/proposals/p5448.md#data-models) won't be supported. + +## Proposal + +- C++ `intN_t` types and the standard integer types `signed char`, `short`, + and `int` will map to the correspondly sized Carbon `iN` type (for example + `int32_t`, `int` -> `i32`). +- Carbon `iN` types will map to C++ `intN_t` types (for example `i32` -> + `int32_t`). +- `long`, `long long` in C++ will map to new Carbon types (for example + `Core.Cpp.long/long_long`). +- Correspondingly, the same will hold for the unsigned types. +- `float` will map to `f32`; `double` to `f64` and `long double` will map to a + new Carbon type (for example `Core.Cpp.long_double`). + +## Details + +### C++ -> Carbon mapping + +| Carbon | C++ | +| ----------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `()` as a return type | `void` | +| `bool` | `bool` | +| `i8` | `int8_t`, `signed char`, `int_fast8_t`, `int_least8_t` | +| `i16` | `int16_t`, `short`, `int_least16_t` | +| `i32` | `int32_t`, `int`, `int_least32_t`, some [ILP32](/proposals/p5448.md#data-models), [LLP64](/proposals/p5448.md#data-models): `intptr_t`, `ptrdiff_t` | +| `i64` | `int64_t` | +| `i128` | `int128_t` | +| `u8` | `uint8_t`, `unsigned char`, `uint_fast8_t`, `uint_least8_t` | +| `u16` | `uint16_t`, `unsigned short`, `uint_least16_t` | +| `u32` | `uint32_t`, `unsigned int`, `uint_least32_t`, some [ILP32](/proposals/p5448.md#data-models), [LLP64](/proposals/p5448.md#data-models): `uintptr_t`, `size_t` | +| `u64` | `uint64_t` | +| `u128` | `uint128_t` | +| `f16` | `std::float16_t (_Float16)` | +| `f32` | `float` | +| `f64` | `double` | +| `f128` | `std::float128_t (_Float128)` | +| `Core.Cpp.long` | `long`, [LP64](/proposals/p5448.md#data-models): `int_fast64_t`, `int_least64_t`, `intmax_t`, `intptr_t`, `ptrdiff_t` | +| `Core.Cpp.long_long` | `long long`, [LLP64](/proposals/p5448.md#data-models): `int_fast64_t`, `int_least64_t`, `intmax_t` | +| `Core.Cpp.unsigned_long` | `unsigned long`, [LP64](/proposals/p5448.md#data-models): `uint_fast64_t`, `uint_least64_t`, `uintmax_t`, `uintptr_t`, `size_t` | +| `Core.Cpp.unsigned_long_long` | `unsigned long long`, [LLP64](/proposals/p5448.md#data-models): `uint_fast64_t`, `uint_least64_t`, `uintmax_t` | +| `Core.Cpp.long_double` | `long double` | +| X (No mapping) | `float32_t`, `float64_t`, `bfloat16_t` | +| TBD | `char`, `charN_t`, `wchar_t` | +| TBD | `std::byte` | +| TBD | `std::nullptr_t` | + +Notes: + +- Multiple C++ types map to one Carbon type only in the cases when these are + type aliases in C++, not distinct types, which should not pose any issues + for overloading. +- `float{32,64}_t`, `bfloat16_t` - won't be supported in Carbon at this point + and may be reconsidered later. +- `std::int_fast{16,32}_t` - will map to the corresponding Carbon type based + on the C++ definition. +- New types: + - The names `Core.Cpp.T` represent new Carbon types that don't exist at + the moment and their exact names will still need to be defined. + - `Core.Cpp.long`: either 32-bit or 64-bit, and use an appropriate int + type that is compatible with (but different) from `i32`/`i64`. + - `Core.Cpp.long_long`: a platform dependent alias to `i64`. + - `Core.Cpp.long_double`: TBD. +- All of the mappings are direct mappings without any runtime performance + overhead. + +### Carbon -> C++ mapping + +The same mapping as above, except for the cases with multiple C++ entries, when +Carbon types will map to the first one in the C++ column (Carbon `iN` -> C++ +`intN_t`). + +Carbon won't have aliases or distinct types for `int_fastN_t`, `int_leastN_t`, +`intmax_t`, `intptr_t`, `ptrdiff_t`, `size_t`. This may be reconsidered in the +future if need arises. + +Some Carbon types may not have direct mappings in C++: `i256`, `u256` , `f80`, +`f256`. + +## Rationale + +One of Carbon's goals is seamless interoperability with C++ (see +[Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code)), +calling for clarity of the calls and high performance. + +The proposal maps the Carbon types to their direct equivalents in C++, with zero +overhead, supporting the request for unsurprising mappings between C++ and +Carbon types with high performance. + +## Alternatives considered + +`long` + +- Provide platform-dependent conversion functions for `long`. + - Advantages: the conversions will be clearly outlined. + - Disadvantages: performance overhead for certain platforms. +- Map `long` always to a fixed-sized Carbon type depending on the platform + (for example to either `i32` or `i64`) + - Advantages: all the code will be using fixed-sized types. + - Disadvantages: the same C++ function may map differently on different + platforms and the Carbon code should compensate for that to make the + code compile. + +`float32_t`, `float64_t` + +- Map `f32` <-> `float32_t` and `f64` <-> `float64_t` + - Advantages: follow the same analogy as for the integer types (`iN` <-> + `intN_t`) + - Disadvantages: + - `float32_t`, `float64_t` are new types since C++23, so this won't be + directly achievable, but the corresponding `_FloatN` types will need + to be used for the older C++ versions. + - they are not aliases for the standard floating-point types (`float`, + `double`, `long double`), but for extended floating-point types, so + type conversions will be needed for the standard types. + +## Open questions + +The mapping of the following types remains open and will be discussed at a later +point: + +- `char`, `char8_t`, `char16_t`, `char32_t`, `wchar_t` + - Carbon still doesn't have character types, so the mapping of these types + will be discussed once they are available. + - These are all distinct types in C++, which should be taken into account + to prevent any issues for overloading. +- `std::byte` +- `std::nullptr_t` +- `void*` +- `Core.Cpp.long_double` - details of this new type is still to be discussed.