This release contains a large amount of performance work, where specifically Python versions 3.7 or higher see regressions in relative performance to CPython fixed. Many cases of macros turned to functions have been found and resolved. For 3.10 specifically we take advantage of new opportunities for optimization. And generally avoiding DLL calls will benefit execution times on platform where the Python DLL is used, most prominently Windows.
Then this also adds new features, specifically custom reports. Also tools to aid with adding Nuitka package configuration input data, to list DLLs and data files.
With multidist we see a brand new ability to combine several programs into one, that will become very useful for packaging multiple binaries without the overhead of multiple distributions.
dependency_injector package. Fixed in 1.3.1 already.passlib.apache package. Fixed in 1.3.2 already.networkx. Fixed in 1.3.3 already.PATH environment variable anymore, which could lead to externally provided compilers and internal winlibs gcc clashing on Windows, but should be a general problem. Fixed in 1.3.4 already.cefpython3 package. Fixed in 1.3.4 already.webview package versions. Fixed in 1.3.4 already.__file__ to None during multi phase imports, which we then didn’t update anymore, however that is necessary. Fixed in 1.3.4 already.match cases where an alternative had no condition associated. Fixed in 1.3.5 already.__module__ attribute because there is code out there, that identifies standard loaders through looking at this value, but crashes without it. Fixed in 1.3.5 already.importlib_metadata backport were using themselves to load their __version__ attribute. Added a workaround for it, since in Nuitka it doesn’t work until after loading the module.pygments.styles module. Fixed in 1.3.6 already.Ellipsis too soon during tree building. It is not quite like True and False. Fixed in 1.3.6 already.numpy on macOS didn’t work inside an application bundle anymore. Fixed in 1.3.7 already.os.add_dll_directory fail to work. Fixed in 1.3.8 already.sqlalchemy. Fixed in 1.3.8 already.importlib.resources.as_file to work well with it. Fixed in 1.3.8 already.cv2 was not working with the opencv-python-headless variant. Package name and distribution name is not a 1:1 mapping for all things. Fixed in 1.3.8 already.tls_client package.requests module does, it only adding a dependency on the resolved name, but not requests itself. The import however was still done at runtime on requests which then didn’t work. This was only visible if only these aliases to other modules were used..dylib when scanning for data files unlike all other DLL suffixes.mplcairo..bin unlike in accelerated mode. However, this didn’t work well for packages which have binaries colliding with the package name. Therefore now the suffix is added in this case too.platform_utils.paths. It is guessing the wrong path for included data files with Nuitka.sound_lib, selecting by OS and architecture.importlib.metadata.metadata for use at runtime we need to use both package name and distribution name to create it, or else it failed to work. Packages like opencv-python-headless can now with this too.tkinterweb on Windows. Other platforms will need work to be done later.UI: Added new option to listing package data files. This is for use with analyzing standalone issues. And will output all files that are data files for a given package name.
list.sh
UI: Added new option to listing package DLL files. This is also for use with analyzing standalone issues.
list.sh
Reports: The usages of modules, successful or not, are now included in the compilation report. Checking out which ones are not-found might help recognition of issues.
Multidist: You can now experimentally create binaries with multiple entry points. At runtime one of multiple __main__ will be executed. The option to use is multiple --main=some_main.py arguments. If then the binary name is changed, on execution you get a different variant being executed.
Using it with only one replaces the previous use of the positional argument given and is not using multidist at all.
Multidist is compatible with onefile, standalone, and mere acceleration. It cannot be used for module mode obviously.
For deployment this can solve duplication.
For wheels, we will probably change those with multiple entry points to compiling multidist executables, so we do avoid Python script entry points there. But this has not yet been done.
--onefile-child-grace-time the hard way. This avoids hangs of processes that fail to properly shutdown.sys.path manipulations in the Yaml configuration with new global-sys-path import hack.--report-template where the user can provide a Jinja2 template to make his own reports.no_asserts, no_docstrings and no_annotations now. These can be used to limit rules to be only applied when these optional modes are active.
Not all packages will work in these modes, but often can be enhanced to work with relatively little patching. This allows to limit these patches to only where they are necessary.sparse and through that Numba in the scipy package, reducing its distribution footprint. Part of 1.3.3 already.trimesh package. Part of 1.3.3 already.shap package. Part of 1.3.8 already.xgboost docstring dependencies, such that --python-flag=no_docstrings can be used with this package.frozenset and empty tuple need no copies
This also speeds up copies of non-empty tuples by avoiding that size checking branch in construction with Python 3.10 or higher.hasattr/getattr/setattr on dynamic attribute names were done. This was making the tree traversal during optimization slower than necessary.
Another shortcoming was that for some nodes, some values are optional, where for others, they are not. Some values are a tuple actually, while most are nodes only. However, dealing with this generically was also slower than necessary.
The new code now enforces children types during creation and updated, it rejects unexpected None values for non-optional children, and it provides generated code to do this in the fastest way possible, although surely some more improvements will come here.
Also when abstract executing the tree, rather than generically visiting all children, this now just unrolls this, and there are even some modes added, where a node can indicate properties, e.g. auto_compute_handling<span> =<span> "final,no_raise" will tell the code generator that this expression never raises in the computation, and is final, i.e. doesn’t have any code to evaluate, because it cannot be optimized any further.
Also the way checkers previously worked, for every node creation, for every child update, a dictionary lookup had to be done. This is now hard coded for the few nodes that actually want to convert values on the fly and we might make a difference in the future for optional checkers, such that these are only run in debug mode.
These changes brought about much faster compilation, however the big elephant in the room will still be merging value traces, and scalability problems remain there.dict.update, etc. now provide type shapes. From these type shapes, mixins for the result value type are picked automatically. Previously these shapes were added manually. In some cases, they were even missing. In a few cases, where the type is dependent on the Python version, we do not currently do this though, so this needs more work, but expanding the coverage got easier in this way.os.path.isdir which was making it relatively slow and wasting 5% compile time on the IO being done. The check got enhanced and most often replaced with using the knowledge from the original import scan eliminating this time..c files, but compiled generators and compiled cells codes were not yet done like this, making life unnecessarily harder for the compiler and linker. This should also allow more optimization for some codes.PyObject_RichCompareBool API, as we have our own comparison functions that are faster and faster to call without crossing of DLL barrier.PyIndex_Check which has become an API in 3.8, and was as a result not inlined anymore with a DLL barrier was to be crossed, making all kinds of multiplication and subscript/index operations slower.PyNumber_Index API with our own code. As of 3.10 it enforces a conversion to long that for Nuitka is not a good thing to do in all places. But also due to DLL barrier it was potentially slow to call, and is used a lot, and we can drop the checks that are useless for Nuitka.PyImport_GetModule for looking up imported modules from sys.modules, rather look it up from interpreter internals, also this was using subscript functions, when this is always a dictionary.PyImport_GetModuleDict and instead have our own API to get this quicker.TODO about inlining the API function used, so we can be faster in a relatively common operation. For every exception handler, we had to do one API call there.PyType_IsSubType replacement these faster to use and avoid the API call.int value startup initialization.
On Python 3.9 or higher we can get small int values directly from the interpreter, and with 3.11 they are accessible as global values.
Also we no longer de-duplicate small int values through our cache, since there is no use in this, saving a bunch of startup time. And we can create the values with our own API replacement, that will work during startup already and save API calls as these can be relatively slow. And esp. for the small values, this benefits from not having to create them.bytes value startup initialization.
On Python 3.10 or higher, we can create these values ourselves without an API call, avoiding its overhead.
Also we no longer de-duplicate small bytes values through our cache, because that is already done by the API and our replacement, so this was just wasting time.slice object values with Python 3.10 or higher
On Python 3.10 or higher, we can create these values ourselves without an API call, avoiding its overhead.
These are important for Python3, because a[x:y] in the general case has to use a[slice(x,y)] on that version, making this somewhat relevant to performance in some cases.str built-in with API calls
For common cases, this avoids API calls. We mostly have this such that print style tests do not have this as API calls where we strive to remove all API calls for given programs.PyErr_NormalizeException that will avoid the API call. It may still call the PyObject_IsSubclass API, for which we only have started replacement work, but this is already a step ahead in the right direction.wheel and setuptools to install by adding a pyproject.toml that addresses a warning of pip. Part of 1.3.6 release already.when conditions that raise, output which it was exactly. Part of 1.3.3 already.--disable-console for GUI packages. Otherwise using that, they just deprive themselves of ways to get error information.ccache report was not enforced. Part of 1.3.7 release already.--onefile-tempdir-spec that has since been made not OS specific, with even the OS specific name being removed.site-packages or __pycache__ folders. This should make it easier to use --include-data-file=./**.qml:. when you have a virtualenv living in the same folder.pywebview plugin that pertain to the DLLs and data files to package configuration.tuple values rather than list values from the tree building stage and node optimization creating new nodes. This allows us to drop conversions previously done inside of nodes.PySide6 or commercial PySide2.The focus of this release was first a major restructuring of how children are handled in the node tree. The generated code opens up the possibility of many more scalability improvements in the coming releases. The pure iteration speed for the node tree will make compile times for the Python part even shorter in coming releases. Scalability will be a continuous focus for some releases.
Then the avoiding of API calls is a huge benefit for many platforms that are otherwise at a disadvantage. This is also only started. We will aim at getting more complex programs to do next to none of these, so far only some tests are working after program start without them, which is of course big progress. We will progress there with future releases as well.
Catching up on problems that previous migrations have not discovered is also a huge step forward to restoring the performance supremacy, that was not there anymore in extreme cases.
The Yaml package configuration work is showing its fruits. More people have been able to contribute changes for anti-bloat or missing dependencies than ever before.
Some part of the Python 3.11 work have positively influenced things, e.g. with the frame cleanup. THe focus of the next release cycle shall be to add support for it. Right now, generator frames need a cleanup to be finished, to also become better and working with 3.11 at the same time. Where possible, work to support 3.11 was also conducted as a cleanup action, or reduction of the technical debts.
All in all, it is fair to say that this release is a big leap forward in all kinds of ways.