Bat file to search local Audacity Manual

DickN · August 4, 2016, 10:03pm

Split from https://forum.audacityteam.org/t/central-location-to-save-plugins-from-audacity-updates/43075/14

Here’s a Windows .bat script to search a local copy of the Audacity manual for a given word or
phrase. It doesn’t make inferences nor support wildcards as Google does, nor does it support regular expression searches. It could be modified to do regular expression searches, and wildcards would then be supported too although the syntax might not be as simple as with Google.

The results are displayed as a file list, which is essentially a subject list so that alone gives a context for the keyword/phrase match. Files can then be selected for display on the default browser. I don’t know how to open multiple tabs simultaneously, but if additional files are selected once the browser is open it may add tabs rather than open new windows. It works this way with Mozilla and Chrome, the only browsers I’ve tested with it so far.

Installation consists of editing the file to insert the pathname of your help folder, removing the ‘.txt’ extension from the file name, and saving the .bat file to the Desktop (or elsewhere with a shortcut on the desktop).

Happy Hunting

DickN
AudManSearch.bat.txt (3.14 KB)

DickN · August 5, 2016, 7:42pm

Just a heads-up, there’s a sequel coming for AudManSearch.bat. I eliminated the temp file and made a few other changes. I’ll post it as soon as I fix a bug.

DickN · August 7, 2016, 12:13am

Here’s the sequel, rev 1.1 of AudManSearch.bat. It eliminates the temp file and adds Rotating Selection and New Search. I wanted to add multiple keyword searches (AND rule), but there’s something I need to learn first about how to get nested 'for’s to work in Windows batch script. Maybe I should read a primer on Java instead - this is pretty basic.
AudManSearch.bat.txt (5.12 KB)

Gale_Andrews · August 7, 2016, 1:38pm

Thanks, Dick.

Of course if the user runs the script from other than the C:\ drive, it only searches the current directory.

I don’t know what to do about that unless you specify C through Z.

Gale

DickN · August 9, 2016, 2:06pm

Interesting. I tried running it from a thumb drive and, as you said, it didn’t find files in the man directory. Oddly, it didn’t get an error (which would have triggered the reminder to insert the pathname of the help folder) when it changed directory to \manual\man in that path (which includes the drive letter).

If you’re running from a command line and type ‘j:audmansearch’, it works (j: is my thumb drive). I remember now from the DOS days, CD :<path> changes the working directory only for explicit references to that drive. OK, the fix is obvious.

Henceforth I’ll include running from a thumb drive in my testing, which will also expedite testing on 3 different systems. I doubt this would be an issue for most users (especially if they launch it from an icon on the desktop), so will save the fix for next rev which one way or another will include multi-keyword search.

Thanks for the bug report!

DickN · August 12, 2016, 12:59am

Here’s rev 2.0. It supports multiple keyword/phrase searches, returning files that contain all of them.
The drive:path bug is fixed, and IMO the script has a tidier interaction with the user.

Note that in rev 1, ‘Q’ just meant quit the display list and then start a new search if you want to, in rev 2
‘Q’ means quit and ‘N’ means start a new search.

There are some options for getting the path info:

Hard-coding (default option) is still in there and it works from any drive:path.
If the environment variable AudacityHelp is pre-initialized, it uses this with no editing required.

The other 2 options require removing 'rem 's from the file:

Option 2:
If run from a shortcut, the drive:path can be put in the “start in” parameter in the shortcut.

Option 3:
If the .bat file is saved in the Audacity folder, it finds the help folder relative to itself.
I like this version but I didn’t make it the default because it’s inflexible about where to put the .bat file.
AudManSearch.bat.txt (7.4 KB)

DickN · August 26, 2016, 4:44pm

I have a “3quel” (rev 3.0).

It does inverse searches (Any files containing a negated term are excluded from the results, rather like a real search engine but without the spelling checker.). It also gives a count of the matches, just in case you were wondering how long that list is.

The path setting is somewhat automated. The .bat script checks for a pre-set environment variable first, then the “start in” path given in the shortcut, then the .bat file’s own folder, for the help folder. The actual path used is shown at startup. The only case where any editing of the .bat file is needed is if user wants to hard-code the path, which then overrides the above.

DickN
AudManSearch.bat.txt (14.5 KB)

steve · August 26, 2016, 5:51pm

I don’t use Windows, but there are several search applications available for all platforms. One that I tried recently is a Java application that works on Windows, OS X and Linux (requires JAVA to be installed), called Puggle. It’s a bit old (2010), but I found that it works very well and when set up to search only the manual, it is very fast.

DickN · August 26, 2016, 11:29pm

Reminds me of my aside at the end of my Aug 6 post out of frustration with cmd.exe - “maybe I should read a primer on java instead - this is pretty basic”.

I should look into getting an IDE for batch files - I wonder if they really reflect the actual behavior of a batch program running without the IDE.

My next iteration was going to be a specialized version simply because I have use for a similar search on label tracks among an archive of Audacity projects. Think I’ll save myself some work and try Puggle for that. I don’t really want to index everything though - I assume that means creating a database of keyword occurrences so the files don’t have to be re-scanned for each search. That would be wasteful in a library that doesn’t get searched very often, especially when only a small part of each .aup file is pertinent to the searches. Can Puggle target particular tagged blocks in .xml files?

steve · August 27, 2016, 8:16am

Puggle creates a database on first use. From first launch of Puggle, setting the search directory, waiting for it to create the database, to getting my first search result, took perhaps 30 seconds. After that, search results are virtually instant. It has “extractors” for “documents”, “music” and “pictures”. I’m not sure what format documents it supports, but it has no problem with html. which is what we need for the manual. I think the “music” and “pictures” searches extract metadata from common audio / image formats. There’s even built-in help, but I’ve not used that. The other nice “feature” is that it runs without elevated privileges (just ordinary user account).

For searching data that hardly ever changes, a database is the way to go. It scans the files once, then you get very fast searches whenever you need them. The database would only need to be rebuilt when you update the manual, and that only takes about 10 seconds on my old budget laptop for the entire manual.

Other than that, all I know about Puggle is what is written on its website http://puggle.sourceforge.net/

There are also a number of open source desktop search apps written in Python, and I found this article which may be of interest, about writing your own search engine using Python, HTML and JavaScript http://www.zackgrossbart.com/hackito/search-engine-python/

Obvious advantages of JAVA or Python is cross-platform support.
Advantages of making a custom search engine for the manual is that it could be optimised for how the manual is written, giving different weightings for when a search term is found in a header, anchor, file name … as well as the total number of occurrences in the file.

Gale_Andrews · August 27, 2016, 12:05pm

Java is considered something of a security risk (especially on Windows of course as the main “target” platform for malware writers).

So ensure your Java settings allow automatic updates and require you to be asked before untrusted Java applications will run. More information at https://java.com/en/security/.

Gale

DickN · August 27, 2016, 1:04pm

My reservation re: using an index to search label tracks in .aup files is the amount of irrelevant data an index file would contain unless there’s a way to target particular tags in xml format documents. If there is, and adding new files to the library doesn’t require re-scanning all the old ones, then yippee! - that’s the way to go.

Brainstorm: Maybe I’ll learn enough about Java to write an extractor to do just that.

Just for an estimate, how big is the index file for the Audacity manual?

Gale - Yes, I do keep Java UTD. I wouldn’t be too concerned about running a 2010 Java app on any Java much newer than the app, however, especially an open-source one .

steve · August 27, 2016, 1:39pm

The database for Puggle seems to be 4.6MB after fully indexing the manual, but note that this is designed as a general purpose Desktop search, so it is probably storing a lot more information than we need. Even so, 4.6MB is a small file for modern computers.

Gale_Andrews · August 27, 2016, 1:55pm

Not really the point, Dick.

The point is that having Java installed creates an attack vector if an attacker can find a way to get a malicious java application installed or running on your computer, notably if you have Java enabled in your web browser.

So if you don’t need Java to use any web pages you visit, disable Java in your browser. This is not the same as Javascript, of course.

Gale

DickN · August 27, 2016, 5:47pm

I currently have Java 11.101.2. I do frequent at least one website that requires it (astroviewer.com) and I have it set “ask to activate” in Mozilla.

But thanks for making me look up the difference between Java and Javascript. I thought it was all Javascript and that compiled Java was an executable. But to be platform independent, the runtime module has to be some meta-code that runs on a virtual machine which is written for each hardware platform.

So I won’t be able to just write a custom extractor and edit it into Puggle - I’ll have to get the compiler or the IDE and either compile Puggle along with it or, if it has provision for running external modules, create one with the required interface.

But wait a minute! Puggle downloads are for specific platforms. Wouldn’t that imply that they run on the target processor? Guess I’ll just have to see…

DickN · August 28, 2016, 12:40am

I installed Puggle on my Vista system. It puts two .exe files in Program Files (x86), so I doubt it’s using the Java platform. I also downloaded the binary version and it contains the same two files which match these in a file compare.

First time I started it, there was a panel with my user folder and space to add others. I didn’t want to index the whole computer, but by the time I finished unchecking all the file types I didn’t want to include (and there was no way to add .aup to the assortment) I found it was already generating its database. I had only .txt, .pdf and .html selected. I tried a search, and there was no way to tell it which folder to search, only the class of files (documents, pictures, music and all). So it found hits from everywhere. Very quickly, I must concede.

The status shown at the bottom of the panel said optimizing was finished, which I take to mean it had finished indexing. When I closed the window, the process was still resident and taking ~100MB of RAM. I found a Puggle icon in the system tray but all it showed was “open…”, which I tried and it opened the same window as the shortcut on the desktop. This time I went into Help, but Help doesn’t seem to work. I tried the File menu and there was an Exit so I tried that. From the File menu, it wanted me to confirm my intent, which I did. Puggle was still in memory.

Then I tried a right-click on the system tray icon and there was Exit, so I tried that. The system tray icon disappeared, but not the memory footprint. I tried deleting the .Puggle data folder to force it to start over with the indexing so I could edit the path before it got too far. I launched Puggle again from the shortcut, only to find that I can’t edit the initial default path - it’s going to do the whole user account again. I closed the window. Then I noticed there were now two copies of Puggle in memory, each taking up ~100MB. I gave it a right-click, Exit from the system tray icon, and sure enough both copies were still there in memory. I gave up and stopped them both in Task Manager.

Steve speaks well of the Apple version, but I’m not very impressed with the PC version. It’s not in Startup, so at least it shouldn’t wake up with the OS.

Steve, did you find a way to append the rest of the help folder path to the default path so it doesn’t index your whole machine? I guess for some people it might make sense to include all user files in a search, but for me that’s way too much stuff.

steve · August 28, 2016, 1:45am

The Linux version.

steve · August 28, 2016, 1:54am

Yes, it was very straightforward.
On first launch it opens this window:

As you can see, the default path is my Home folder, so I removed that (select the path then click the “- Re…” button), then added the folder containing the manual.

Then click the “Close” button.
It then builds the index and is ready to go.

DickN · August 29, 2016, 10:44am

Arrrgh!

As my Dad used to say, “It could have bit me”!

Thanks, Steve

DickN

DickN · August 29, 2016, 6:00pm

Interesting. I have now Puggle installed with just …help\manual\man indexed.

Puggle gets: AudManSearch gets: (comment)
search string

190 matches 113 matches (Puggle gets 77 extra matches for “settings” - see *below)
settings

17 matches 17 matches
portable

17 matches 10 matches (Puggle’s extra “settings” matches overlapped the other 7 "portable"s)
settings portable

1 match 1 match
“portable settings”

16 matches 9 matches (Both excluded the 1 match of “portable settings”)
portable settings -“portable settings”

0 matches 7 matches (Puggle’s extra “settings” matches overlapped the other 7 "portable"s)
-settings portable

20 matches 103 matches (Puggle: 20 vs 17 without the negation! I don’t get it. There were 190 "settings"s and 17 "portable"s).
settings -portable (Swapping the order of the terms makes no difference, nor does putting them in quotes, nor does using ‘!’ instead of ‘-’.)

1 match 1 match
settings portable “portable settings”

*I found that at least some (I’m guessing ‘all’) of the extra matches Puggle gets to settings and “settings” are because it accepts partial matches. “setting”, for example, is taken as a match to “settings”. Using quotes doesn’t force exact match.

Too bad Help doesn’t work. The subject list suggests the program is much more versatile that what I’m seeing. Or it could be that the missing Help text resides on the developer’s computer as a “to do” list. I don’t understand some of what Puggle is doing.

What’s to like about Puggle?

It shows a sample of the matching text for each match.

It has stars - The number of stars highlighted for each match give some metric of relevance, maybe whether it’s in the title, in a keyword list, or number of occurrences in the text.

It doesn’t have the issue my .bat script has with getting control back after initially starting the HTML viewer (Mozilla).

It’s pretty fast. OTOH, the .bat script only takes a few seconds to scan the whole folder the first time it’s run. If you run it again (or do a new search), Windows has all the files cached anyway and then most searches take a second or less.

The only behavior I can confidently say is a bug (in the Windows version) is the “dog gone” memory leak at termination. If you launch Puggle and exit (not just close the window - Puggle will tell you it’s already running if you try to start another instance) N times, you’re left with N instances in memory taking 90-100 MB each, which you have to remove with Task Manager.

DickN