This release contains a large amount of performance work, where specifically Python versions 3.7 or higher see regressions in relative performance to CPython fixed. Many cases of macros turned to functions have been found and resolved. For 3.10 specifically we take advantage of new opportunities for optimization. And generally avoiding DLL calls will benefit execution times on platform where the Python DLL is used, most prominently Windows.
Then this also adds new features, specifically custom reports. Also tools to aid with adding Nuitka package configuration input data, to list DLLs and data files.
With multidist we see a brand new ability to combine several programs into one, that will become very useful for packaging multiple binaries without the overhead of multiple distributions.
dependency_injector
package. Fixed in 1.3.1 already.passlib.apache
package. Fixed in 1.3.2 already.networkx
. Fixed in 1.3.3 already.PATH
environment variable anymore, which could lead to externally provided compilers and internal winlibs gcc clashing on Windows, but should be a general problem. Fixed in 1.3.4 already.cefpython3
package. Fixed in 1.3.4 already.webview
package versions. Fixed in 1.3.4 already.__file__
to None
during multi phase imports, which we then didn’t update anymore, however that is necessary. Fixed in 1.3.4 already.match
cases where an alternative had no condition associated. Fixed in 1.3.5 already.__module__
attribute because there is code out there, that identifies standard loaders through looking at this value, but crashes without it. Fixed in 1.3.5 already.importlib_metadata
backport were using themselves to load their __version__
attribute. Added a workaround for it, since in Nuitka it doesn’t work until after loading the module.pygments.styles
module. Fixed in 1.3.6 already.Ellipsis
too soon during tree building. It is not quite like True
and False
. Fixed in 1.3.6 already.numpy
on macOS didn’t work inside an application bundle anymore. Fixed in 1.3.7 already.os.add_dll_directory
fail to work. Fixed in 1.3.8 already.sqlalchemy
. Fixed in 1.3.8 already.importlib.resources.as_file
to work well with it. Fixed in 1.3.8 already.cv2
was not working with the opencv-python-headless
variant. Package name and distribution name is not a 1:1 mapping for all things. Fixed in 1.3.8 already.tls_client
package.requests
module does, it only adding a dependency on the resolved name, but not requests
itself. The import however was still done at runtime on requests
which then didn’t work. This was only visible if only these aliases to other modules were used..dylib
when scanning for data files unlike all other DLL suffixes.mplcairo
..bin
unlike in accelerated mode. However, this didn’t work well for packages which have binaries colliding with the package name. Therefore now the suffix is added in this case too.platform_utils.paths
. It is guessing the wrong path for included data files with Nuitka.sound_lib
, selecting by OS and architecture.importlib.metadata.metadata
for use at runtime we need to use both package name and distribution name to create it, or else it failed to work. Packages like opencv-python-headless
can now with this too.tkinterweb
on Windows. Other platforms will need work to be done later.UI: Added new option to listing package data files. This is for use with analyzing standalone issues. And will output all files that are data files for a given package name.
list.sh
UI: Added new option to listing package DLL files. This is also for use with analyzing standalone issues.
list.sh
Reports: The usages of modules, successful or not, are now included in the compilation report. Checking out which ones are not-found
might help recognition of issues.
Multidist: You can now experimentally create binaries with multiple entry points. At runtime one of multiple __main__
will be executed. The option to use is multiple --main=some_main.py
arguments. If then the binary name is changed, on execution you get a different variant being executed.
Using it with only one replaces the previous use of the positional argument given and is not using multidist at all.
Multidist is compatible with onefile, standalone, and mere acceleration. It cannot be used for module mode obviously.
For deployment this can solve duplication.
For wheels, we will probably change those with multiple entry points to compiling multidist executables, so we do avoid Python script entry points there. But this has not yet been done.
--onefile-child-grace-time
the hard way. This avoids hangs of processes that fail to properly shutdown.sys.path
manipulations in the Yaml configuration with new global-sys-path
import hack.--report-template
where the user can provide a Jinja2 template to make his own reports.no_asserts
, no_docstrings
and no_annotations
now. These can be used to limit rules to be only applied when these optional modes are active.
Not all packages will work in these modes, but often can be enhanced to work with relatively little patching. This allows to limit these patches to only where they are necessary.sparse
and through that Numba in the scipy
package, reducing its distribution footprint. Part of 1.3.3 already.trimesh
package. Part of 1.3.3 already.shap
package. Part of 1.3.8 already.xgboost
docstring dependencies, such that --python-flag=no_docstrings
can be used with this package.frozenset
and empty tuple
need no copies
This also speeds up copies of non-empty tuples by avoiding that size checking branch in construction with Python 3.10 or higher.hasattr/getattr/setattr
on dynamic attribute names were done. This was making the tree traversal during optimization slower than necessary.
Another shortcoming was that for some nodes, some values are optional, where for others, they are not. Some values are a tuple
actually, while most are nodes only. However, dealing with this generically was also slower than necessary.
The new code now enforces children types during creation and updated, it rejects unexpected None
values for non-optional children, and it provides generated code to do this in the fastest way possible, although surely some more improvements will come here.
Also when abstract executing the tree, rather than generically visiting all children, this now just unrolls this, and there are even some modes added, where a node can indicate properties, e.g. auto_compute_handling<span> =<span> "final,no_raise"
will tell the code generator that this expression never raises in the computation, and is final, i.e. doesn’t have any code to evaluate, because it cannot be optimized any further.
Also the way checkers
previously worked, for every node creation, for every child update, a dictionary lookup had to be done. This is now hard coded for the few nodes that actually want to convert values on the fly and we might make a difference in the future for optional checkers, such that these are only run in debug mode.
These changes brought about much faster compilation, however the big elephant in the room will still be merging value traces, and scalability problems remain there.dict.update
, etc. now provide type shapes. From these type shapes, mixins for the result value type are picked automatically. Previously these shapes were added manually. In some cases, they were even missing. In a few cases, where the type is dependent on the Python version, we do not currently do this though, so this needs more work, but expanding the coverage got easier in this way.os.path.isdir
which was making it relatively slow and wasting 5% compile time on the IO being done. The check got enhanced and most often replaced with using the knowledge from the original import scan eliminating this time..c
files, but compiled generators and compiled cells codes were not yet done like this, making life unnecessarily harder for the compiler and linker. This should also allow more optimization for some codes.PyObject_RichCompareBool
API, as we have our own comparison functions that are faster and faster to call without crossing of DLL barrier.PyIndex_Check
which has become an API in 3.8, and was as a result not inlined anymore with a DLL barrier was to be crossed, making all kinds of multiplication and subscript/index operations slower.PyNumber_Index
API with our own code. As of 3.10 it enforces a conversion to long
that for Nuitka is not a good thing to do in all places. But also due to DLL barrier it was potentially slow to call, and is used a lot, and we can drop the checks that are useless for Nuitka.PyImport_GetModule
for looking up imported modules from sys.modules
, rather look it up from interpreter internals, also this was using subscript functions, when this is always a dictionary.PyImport_GetModuleDict
and instead have our own API to get this quicker.TODO
about inlining the API function used, so we can be faster in a relatively common operation. For every exception handler, we had to do one API call there.PyType_IsSubType
replacement these faster to use and avoid the API call.int
value startup initialization.
On Python 3.9 or higher we can get small int values directly from the interpreter, and with 3.11 they are accessible as global values.
Also we no longer de-duplicate small int values through our cache, since there is no use in this, saving a bunch of startup time. And we can create the values with our own API replacement, that will work during startup already and save API calls as these can be relatively slow. And esp. for the small values, this benefits from not having to create them.bytes
value startup initialization.
On Python 3.10 or higher, we can create these values ourselves without an API call, avoiding its overhead.
Also we no longer de-duplicate small bytes values through our cache, because that is already done by the API and our replacement, so this was just wasting time.slice
object values with Python 3.10 or higher
On Python 3.10 or higher, we can create these values ourselves without an API call, avoiding its overhead.
These are important for Python3, because a[x:y]
in the general case has to use a[slice(x,y)]
on that version, making this somewhat relevant to performance in some cases.str
built-in with API calls
For common cases, this avoids API calls. We mostly have this such that print
style tests do not have this as API calls where we strive to remove all API calls for given programs.PyErr_NormalizeException
that will avoid the API call. It may still call the PyObject_IsSubclass
API, for which we only have started replacement work, but this is already a step ahead in the right direction.wheel
and setuptools
to install by adding a pyproject.toml
that addresses a warning of pip
. Part of 1.3.6 release already.when
conditions that raise, output which it was exactly. Part of 1.3.3 already.--disable-console
for GUI packages. Otherwise using that, they just deprive themselves of ways to get error information.ccache
report was not enforced. Part of 1.3.7 release already.--onefile-tempdir-spec
that has since been made not OS specific, with even the OS specific name being removed.site-packages
or __pycache__
folders. This should make it easier to use --include-data-file=./**.qml:.
when you have a virtualenv living in the same folder.pywebview
plugin that pertain to the DLLs and data files to package configuration.tuple
values rather than list
values from the tree building stage and node optimization creating new nodes. This allows us to drop conversions previously done inside of nodes.PySide6
or commercial PySide2
.The focus of this release was first a major restructuring of how children are handled in the node tree. The generated code opens up the possibility of many more scalability improvements in the coming releases. The pure iteration speed for the node tree will make compile times for the Python part even shorter in coming releases. Scalability will be a continuous focus for some releases.
Then the avoiding of API calls is a huge benefit for many platforms that are otherwise at a disadvantage. This is also only started. We will aim at getting more complex programs to do next to none of these, so far only some tests are working after program start without them, which is of course big progress. We will progress there with future releases as well.
Catching up on problems that previous migrations have not discovered is also a huge step forward to restoring the performance supremacy, that was not there anymore in extreme cases.
The Yaml package configuration work is showing its fruits. More people have been able to contribute changes for anti-bloat
or missing dependencies than ever before.
Some part of the Python 3.11 work have positively influenced things, e.g. with the frame cleanup. THe focus of the next release cycle shall be to add support for it. Right now, generator frames need a cleanup to be finished, to also become better and working with 3.11 at the same time. Where possible, work to support 3.11 was also conducted as a cleanup action, or reduction of the technical debts.
All in all, it is fair to say that this release is a big leap forward in all kinds of ways.