[arch-general] Can I prevent Pacman from running hooks?

Discussion:

Tobias Hunger via arch-general

2018-10-13 08:11:10 UTC

Hi Arch Community,

I have scripts that will install a set of arch linux machines for me
with all the tweaks I want. These scripts run pacman a lot to install
bits and pieces. Usually the script will running pacman to install one
package and then configure that package and then proceed to install
the next (set of) packages. This works great, but a noticeable part
of the run time is spent on running pacman hooks (e.g. to update the
man-db and similar things).

So I am wondering: Is it possible to stop pacman from running hooks
during package installation? I do know when it is "safe" to not run
hooks (when the system is not complete yet) and when I need to run all
of them (right after the system has all packages installed that it
will have).

So far I run pacman with a --hookdir that contains symlinks to
/dev/null named like some of the more expensive hooks that tend to
take long to complete. But a --no-hooks option to pacman would be
great for my use case.

Is it possible to run pacman without it triggering hooks?

Best Regards,
Tobias

Doug Newgard via arch-general

2018-10-13 08:15:10 UTC

Permalink

On Sat, 13 Oct 2018 10:11:10 +0200

Post by Tobias Hunger via arch-general
Hi Arch Community,
I have scripts that will install a set of arch linux machines for me
with all the tweaks I want. These scripts run pacman a lot to install
bits and pieces. Usually the script will running pacman to install one
package and then configure that package and then proceed to install
the next (set of) packages. This works great, but a noticeable part
of the run time is spent on running pacman hooks (e.g. to update the
man-db and similar things).
So I am wondering: Is it possible to stop pacman from running hooks
during package installation? I do know when it is "safe" to not run
hooks (when the system is not complete yet) and when I need to run all
of them (right after the system has all packages installed that it
will have).
So far I run pacman with a --hookdir that contains symlinks to
/dev/null named like some of the more expensive hooks that tend to
take long to complete. But a --no-hooks option to pacman would be
great for my use case.
Is it possible to run pacman without it triggering hooks?
Best Regards,
Tobias

Some hooks take into account the specific files that were installed, so you
cannot run them later. Why not just install everything at once?

Tobias Hunger via arch-general

2018-10-13 12:12:11 UTC

Permalink

On Sat, Oct 13, 2018, 10:15 Doug Newgard via arch-general <

Post by Doug Newgard via arch-general
Some hooks take into account the specific files that were installed, so you
cannot run them later.

I am aware of that. This is an optimization that avoids running hooks
needlessly. All the hooks I read so far are safe to run at any time.

Why not just install everything at once?
I run a immutable and stateless setup. So I can not actually update systems
(they are immutable after all). So I end up having my CI generate images
for my systems every couple of hours. Those will then replace the images I
run eventually, getting me to a new updated system state.

The CI builds a very basic system and configured that. It then creates more
specialized systems based on that base image (e.g. one for VMs, one for
containers and another one for bare metal). It then continues to Branch out
from those till all the actual systems I want to install are reached.
Overall this approach (somewhat I spired by docker files) saves a lot of
time over just creating each system from scratch and it also makes sure all
systems have the same basic features: They all I before them from a common
base after all.

It would suffice to run all hooks in the leaves of the tree of systems
(just before writing the actual HDD image files) and skip them for all
others.

Best Regards,
Tobias

Jonathon Fernyhough

2018-10-13 12:25:33 UTC

Permalink

Post by Tobias Hunger via arch-general
I run a immutable and stateless setup. So I can not actually update systems
(they are immutable after all). So I end up having my CI generate images
for my systems every couple of hours. Those will then replace the images I
run eventually, getting me to a new updated system state.

As a slight side-step, might it be possible to generate the images in a
ramdisk/tmpfs? That should remove disk I/O as a bottleneck.

Another option, given you know the expected end state, could be to
bypass pacman and extract the package file content into place directly
(e.g. with tar), then run whatever hooks you want afterwards. Yes, you'd
lose package management within the images but that doesn't matter with
an immutable image. (Though, there may be side-effects I haven't
considered here.)

Tobias Hunger via arch-general

2018-10-13 16:36:32 UTC

Permalink

On Sat, Oct 13, 2018 at 2:25 PM Jonathon Fernyhough

Post by Jonathon Fernyhough

As a slight side-step, might it be possible to generate the images in a
ramdisk/tmpfs? That should remove disk I/O as a bottleneck.

That is possible.

Post by Jonathon Fernyhough
Another option, given you know the expected end state, could be to
bypass pacman and extract the package file content into place directly
(e.g. with tar), then run whatever hooks you want afterwards. Yes, you'd
lose package management within the images but that doesn't matter with
an immutable image. (Though, there may be side-effects I haven't
considered here.)

The price is loosing package management. I do want to keep that to
drag in dependencies.

Best Regards,
Tobias

Doug Newgard via arch-general

2018-10-13 15:15:10 UTC

Permalink

On Sat, 13 Oct 2018 14:12:11 +0200

Post by Tobias Hunger via arch-general
It would suffice to run all hooks in the leaves of the tree of systems
(just before writing the actual HDD image files) and skip them for all
others.

Again, no, it wouldn't. The hooks would not run correctly.

Tobias Hunger via arch-general

2018-10-13 16:41:45 UTC

Permalink

On Sat, Oct 13, 2018 at 5:15 PM Doug Newgard via arch-general

Post by Doug Newgard via arch-general
On Sat, 13 Oct 2018 14:12:11 +0200

Post by Tobias Hunger via arch-general
It would suffice to run all hooks in the leaves of the tree of systems
(just before writing the actual HDD image files) and skip them for all
others.

Again, no, it wouldn't. The hooks would not run correctly.

What makes you say so?

I see nothing in the alpm-hooks man page that implies that this would not work.

Best Regards,
Tobias

Doug Newgard via arch-general

2018-10-13 16:44:37 UTC

Permalink

On Sat, 13 Oct 2018 18:41:45 +0200

Post by Tobias Hunger via arch-general
On Sat, Oct 13, 2018 at 5:15 PM Doug Newgard via arch-general

Post by Doug Newgard via arch-general
On Sat, 13 Oct 2018 14:12:11 +0200

Post by Tobias Hunger via arch-general
It would suffice to run all hooks in the leaves of the tree of systems
(just before writing the actual HDD image files) and skip them for all
others.

Again, no, it wouldn't. The hooks would not run correctly.

What makes you say so?
I see nothing in the alpm-hooks man page that implies that this would not work.
Best Regards,
Tobias

Because, as I said earlier, hooks can and do take into account the specific
files being installed. If you install one package that needs a specific hook,
running that hook later will have the correct file list and will not run
correctly.

Doug Newgard via arch-general

2018-10-13 16:49:53 UTC

Permalink

On Sat, 13 Oct 2018 11:44:37 -0500

Post by Doug Newgard via arch-general
On Sat, 13 Oct 2018 18:41:45 +0200

Post by Tobias Hunger via arch-general
On Sat, Oct 13, 2018 at 5:15 PM Doug Newgard via arch-general

Post by Doug Newgard via arch-general
On Sat, 13 Oct 2018 14:12:11 +0200

Post by Tobias Hunger via arch-general
It would suffice to run all hooks in the leaves of the tree of systems
(just before writing the actual HDD image files) and skip them for all
others.

Again, no, it wouldn't. The hooks would not run correctly.

What makes you say so?
I see nothing in the alpm-hooks man page that implies that this would not work.
Best Regards,
Tobias

That should read "will not have the correct file list"

Tobias Hunger via arch-general

2018-10-13 16:58:53 UTC

Permalink

On Sat, Oct 13, 2018 at 6:44 PM Doug Newgard via arch-general

Post by Doug Newgard via arch-general
Because, as I said earlier, hooks can and do take into account the specific
files being installed. If you install one package that needs a specific hook,
running that hook later will have the correct file list and will not run
correctly.

Most hooks are just run and do not care for any input. Some of the
hook scripts take a list from stdin. By the way: It would be nice if
that was documented in the alpm-hooks man page.

I see no reason why I can not generate this file list right when I
want to run the hooks. In my setup I can ignore anything but the
Install hooks. For those I just need to apply the glob patterns in the
Target fields. That should not be too hard.

Best Regards,
Tobias

Andrew Gregory via arch-general

2018-10-13 18:39:49 UTC

Permalink

Some of the hook scripts take a list from stdin. By the way: It
would be nice if that was documented in the alpm-hooks man page.

"NeedsTargets Causes the list of matched trigger targets to be passed
to the running hook on stdin."

Tobias Hunger via arch-general

2018-10-13 20:14:35 UTC

Permalink

Post by Andrew Gregory via arch-general
"NeedsTargets Causes the list of matched trigger targets to be passed
to the running hook on stdin."

Ah, great, I missed that part!

Eli Schwartz via arch-general

2018-10-14 01:23:25 UTC

Permalink

Post by Tobias Hunger via arch-general
I have scripts that will install a set of arch linux machines for me
with all the tweaks I want. These scripts run pacman a lot to install
bits and pieces. Usually the script will running pacman to install one
package and then configure that package and then proceed to install
the next (set of) packages. This works great, but a noticeable part
of the run time is spent on running pacman hooks (e.g. to update the
man-db and similar things).

If there are specific hooks you don't want to use, you can use the
default HookDir (as documented in alpm-hooks(5) *and* in pacman.conf(5),
this is "/etc/pacman.d/hooks/") to mask them with symlinks to /dev/null.

You've responded that you need to batch each image installation process
using incremental runs inspired by docker, which I sort of understand,
but I don't really see how delaying execution until the end, is a good
idea here.

Consider things like the texinfo package, which installs a hook to run
many install-info processes, once for each file in usr/share/info/ that
the hook detects. I guess you could write your own custom handling for
this and just try to install the whole directory at the end, but you'd
need to judge based on the hook, and adapt to a unique situation. And
it's irrelevant, since delaying execution will not provide benefits over
doing it every time a file is installed -- on the contrary, delaying
execution means you repeat some of the base image work separately for
each child image.

You should override hooks by hand, when you know what they do, know that
they don't require targets, and know that you're handling it yourself at
the end.

Post by Tobias Hunger via arch-general
So I am wondering: Is it possible to stop pacman from running hooks
during package installation? I do know when it is "safe" to not run
hooks (when the system is not complete yet) and when I need to run all
of them (right after the system has all packages installed that it
will have).
So far I run pacman with a --hookdir that contains symlinks to
/dev/null named like some of the more expensive hooks that tend to
take long to complete. But a --no-hooks option to pacman would be
great for my use case.

Expensive hooks, like the man-db hook? I find that annoying on my
standard system as well, which is why I use
https://github.com/graysky2/mandb-ondemand

I'm extremely skeptical that we'll add a --no-hooks option.

We have a --noscriptlet option, but that's because there's no other way
to stop a scriptlet from doing specific things you don't want done.

Hooks were designed to be configurable and able to be masked on an
individual level via symlinks. An option to prevent hooks from running
would therefore serve no purpose except to say "running hooks at all is
undesirable to me", which I don't think is the statement we want to say...

Post by Tobias Hunger via arch-general
Is it possible to run pacman without it triggering hooks?

You can add a NoExtract directive to pacman.conf, which prevents hooks
from being installed to the system in the first place. Although I don't
know how you'd determine what hooks should exist, in order to handle
their actions yourself.

You can also compile your own pacman package using:

CPPFLAGS='-DSYSHOOKDIR=\"/i/dont/want/to/run/any/hooks/\"' ./configure

But I don't see any point to this.

--
Eli Schwartz
Bug Wrangler and Trusted User