GNU LibC's "wcsmbs" sublibrary implements UTF16 processing, which loosely resembles the "string" library with builtin "iconv" integration. Though UNIX programs now typically stick to UTF8 so "wcsmbs" doesn't get as much use or optimization. (Microsoft's kernel however deals in ASCII or UTF16, so things are different there) Typically when any real text processing is desired we pull in the external LibICU library to be properly internationalized!

1/?

GNU LibC's "wcsmbs"'s `wctob` function wraps the innards of "gconv" to convert char into a Unicode codepoint, fastpath for ASCII.

`wc[s]width` wraps the configured locale & "wctype" to compute the width of each char in the given widestring, `wcswidth` sums the result.

`wcstok` wraps `wcsspn` then `wcspbrk`, returning early NULL on error.

`wcsspn` iterates over the given widestring & with an innerloop over the acceptables chars to exit as soon as it finds a char in that set.

2/?

Follow

`wcsrchr` has a C macro-unrolled loop (defaulting to no unrolling) over the given widestring storing the index of the last occurance of the given char, which it should return upon reaching the nil byte.

`wcspbrk` repeatedly calls `wcschr` for each starting index in the given widestring.

`wcsncmp` iterates every equal char in the two given widestrings in lockstep, comparing them, returning that difference once nonequal or end of strings.

`wcsncat` wraps `wcs[n]len` & `wmemcopy`.

3/?

`wcsmbs_getfct` wraps `gconv_find_transform` with validation. `wcsmbs_load_conv` wraps `wcsmbs_getfct` with added pre- & post- processing. `wcsmbs_clone_conv` wraps `get_gconv_fcts` with locked refcounts. `wcsmbs_named_conv` wraps `wcsmbs_getfct` to retrieve converters in both directions, erroring if either's unavailable. `nl_cleanup_ctype` retrieves the locale's `private.ctype` property, cleaning up if unavailable.

4/?

`mbsrtowcs_l` wraps the "gconv" internals to convert a string in a locale- (caller-provided) specified encoding into a (UTF16) widestring, specially handling when it's left to do the `malloc`. `mbrtoc16` works similarly choosing the input text encoding from the global locale.

`c16tomb` splits the specified 16bit char into one or two bytes before continuing via `wcrtomb`.

I wouldn't recommend using "wcsmbs", it doesn't appear to have kept up after Unicode found more than 65,536 chars.

5/?

For "wcsmbs" functions which can may have platform-dependant optimizations applied:

`btowc` wraps the "gconv" internals to convert a Unicode codepoint int to a widechar according to the global locale's configured char encoding, with fastpaths taken for single-pass conversions & especially ASCII.

`mbrtowc also wraps the "gconv" internals to convert a string to a widestring according to the configured locale's charset, taking special error-reporting care.

6/?

`mbsnrtowcs` is another wrapper around the "gconv" internals handling sized strings, specially handling cases where it's responsible for `malloc`ing the output.

`wcrtomb` is a more straightforward "gconv" internals wrapper converting widestrings to strings, given some state to propagate between consecutive calls.

`wcscasecmp` iterates lockstep over the equal (once lowercased) chars in both given widestrings, returns the case insensitive difference of the next char.

7/?

`wcschrnul` iterates over a given widestring until it reaches the given char or nil, returning that pointer.

`wcschr`, with C macro-based loop unrolling, iterates over a given widestring saving the last index where it encountered the given char & returns that once it reaches the nil byte.

`wcscmp` iterates lockstep over the equal chars in both given widestrings, returning the difference between the next chars.

`wcscpy` wraps `wmemcpy` or uses it's own macro-unrolled loop.

8/?

`wcslen` iterates over a given widestring in chunks of 4 & returns the char count.

`wcsncasecmp` iterates lockstep over the equal-once-lowercased chars two given widestrings returning the lowercased difference between the subsequent chars.

`wcsnrtombs` (GNU-specific, very useful internally) again wraps the "gconv" internals to convert from a sized widestring to sized string according to the given locale, specially handling when it's responsible for the `malloc`. `wcsrtombs`works similarly.
9/?

`wcsstr` iterates over the given "haystack" widestring (including some inner fastpath loops) with a seperate iterator over the given "needle" widestring to determine at which index that needle occurs in the haystack without backtracking on said haystack.

`wmemset` sets each index in the given sized widestring to the given character in chunks of 4.

`wmemcmp` the compares the given number of chars in both given widestrings to determine the first differing character & returns that diff, 4-chunks.

And `wmemchr` iterates over the given sized widestring to determine the first differing widechar & returns it's pointer.

---

Other more trivial wrapper functions which may be overriden by platform-specific optimizations include:

* `wmempcpy` which wraps `mempcpy`
* `wmemmove` wraps `memmove`
* `wmemcpy` wraps `memcpy`
* `wcsnlen` wraps `wmemchr`
* `wcsncpy` wraps `wcsnlen`, maybe `wmemset`, & `wmemcpy`
* `wcscat` wraps `wcslen` & `wcscpy`

11/12

* `wcpncpy` wraps `wcsnlen`, `wmemcpy`, & `wmemset`
* `wcpcpy` wraps `wcslen` & `wmemcpy`
* `mbsrtowcs` wraps `mbsrtowcs_l` for the global locale
* `mbs_init` initializes some state for some conversion functions I described earlier.
* & `mbrlen` wraps `mbrtowc`.

12/12 Fin for "wcsmbs"

Sign in to participate in the conversation
FLOSS.social

For people who care about, support, or build Free, Libre, and Open Source Software (FLOSS).