Compare two excel files

There is a hidden tool if you are running a Professional Version of office. You find it here:

Depending on the version of office that you are running the folder “Office15” can have another number. It allows you to select to files and visualize them side-by-side with a nice graphical overview of the differences

PDF OCR with Fedora 24 and Tesseract

Run the following commands:

Now you can convert a file like this:

If you don’t install the tesseract-osd package, it will work but the following error message appears:

Mount Amazon S3 on Fedora 24

There is no package that is ready to be installed. You need to download and compile the code yourself. First you need to install some development libraries. Execute the following commands:

Then you need to create the directory where you want to mount your bucket:

Now you need to prepare your credentials. The AwsAccessKeyId as well as the AwsSecretAccessKey is needed:

Now you can mount your bucket:

Unfortunately if something goes wrong (for example wrong credentials) it doesn’t show you a error message. The folder is just empty. In such a case you can run the debug mode of the command to see more clearly what is going on:

Paperless Office with Djvu

Scanning

In order to scan everything that comes in you usually need to have two scanners: a flatbed one and an AMF scanner. I use … which is my printer at the same time and … which is only a scanner. Most of the documents I can scan with … which is quite fast as I can put multiple pages (maybe 10) at the same time. The flatbed scanner is used for documents which are either to thick (books) or to large (posters) for the other scanner.

Sometimes I scan directly to PDF but sometimes I split the pages up to jpg. This has the advantage that I can remove pages that are not needed and if one page is not scanned well, I just need to rescan this page and can easily replace it. Usually I scan with 300dpi. This gives me a good quality.

To convert multiple JPEG-Files to a PDF document the following command can be used:

OCR

In order to recognize the text of the files and to make it searchable I convert the files to the djvu format. This is easier to hanle with open source tools than PDF. The conversion can be done like this:

Now we can run OCR on all those djvu Files:

Remark: as most of my documents are in german I specified that language. Otherwise it will not recognize the Umlaute. The command also doesn’t work if the filename contains any non-ascii characters.

Other useful commands

OCR for a single pdf-file:

Converting images directly to DJVU

Converting DJVU back to PDF

Spliting DJVU files into single pages

Converting DJVU single pages to images

pre class=”code”>
ddjvu –format=tiff page.djvu page.tiff

Extract specific pages from a djvu file and save them as image:

Show the text that was recognized by OCR:

https://mail.gnome.org/archives/tracker-list/2010-August/msg00020.html

Migrating from Owncloud to Nextcloud on Fedora 24

Installing nextcloud on fedora is not trivial as you first need to download the code, install some dependencies and compile it.

Without installing the dependencies you would get the following error messages:

  • No CMAKE_CXX_COMPILER could be found.
  • Could NOT find OpenSSL, try to set the path to OpenSSL root folder in the
    system variable OPENSSL_ROOT_DIR (missing: OPENSSL_LIBRARIES
    OPENSSL_INCLUDE_DIR) (Required is at least version “1.0.0”)
  • No package ‘sqlite3’ found. Could NOT find SQLite3 (missing: SQLITE3_LIBRARIES SQLITE3_INCLUDE_DIRS)
    (Required is at least version “3.8.0”)
  • Could NOT find Qt4 (missing: QT_QTWEBKIT_INCLUDE_DIR QT_QTWEBKIT_LIBRARY) (found suitable version “4.8.7”, minimum required is “4.7.0”)
    Qt QTWEBKIT library not found.
  • Could NOT find QtKeychain (missing: QTKEYCHAIN_LIBRARY
    QTKEYCHAIN_INCLUDE_DIR)

Before you can start it you need to add the following line in ~/.bashrc and restart the system

otherwise you get the error nextcloud: error while loading shared libraries: libnextcloudsync.so.0: cannot open shared object file: No such file or directory

Also don’t forget to uninstall the old client:

On the first start the Nextcloud Connection Wizard opens. You can now enter the Server Adress, Username and Password. Then you need to select the folders that you want to sync and as local folder you chose the same folder as you configured with owncloud. If you do this, the option “Keep local data” appears and if you check it, the data will be taken over.

Using a .NET FileWatcher with a dfs properly

There is not a lot of information with practical experience about the Microsoft .NET File Watcher Technology available in the internet. So here some advices.

How it works

It is basically a wrapper around the windows api function ReadDirectoryChangesW.

Possible restrictions with remote locations

There might be several limitations in place that you usually cannot see:

  • Latency raises the chance for a buffer overflow.
  • The buffer size might be artifically restricted to a size smaller than 64 kB.
  • The implementation might miss the ReadDirectoryChangesW function or it might be implemented incorrectly (for example if you are dealing with a linux/unix server)
  • The server might shut down incorrectly. In such a case the FileWatcher might be unaware of the fact that his handle is invalid and just stop reporting changes from that point on. Unfortunately it will usually also not raise any error and there is no timeout. The only solution I know about is a frequent reregistration (setting EnableRaising events to false and the to true). But if you do this, you need to pay attention not to loose any events.

My advice would be to do some testing whenever you are using the FileWatcher with a remote location. You could create a programm that generates some files and verify that the events are properly raised

Buffer

The FileWatcher uses an internal buffer of 8 kB. It is possible to raise it to 64 kB. This should be done if you are monitoring an entire disk where there can be many events. However it is costly because it will be nonpaged memory. When you get the event it frees some space in the buffer. This means it is really important to process the events as fast as possible. Don’t do any check in the event handler. Add the event to another queue of your own and process it there asynchronously. If the buffer overflows you get an error and you

The size of the elements of the buffer is 12 bytes + two times the length of the path. In the worst case it can hold only 15 events (all path 260 characters) when configured with a buffer size of 8 kB.

Error Handling

You need to register to the Error event.

One reason why an error could be raised is that the buffer has been filled up. In that case you will get an InternalBufferOverflowException. It is likely that you missed events.

You might also get an error when a remote location shuts down or some network error happens. In such a case it is usually advisable to reregister the watcher. Otherwise it might be the case that it doesn’t raise any more events (invalid handle).

Locking

You will receive many events when the file it concerns is still locked. Therefore you need some logic to wait until the file is “free” before you do whatever you want to do.

Configuring it correctly

You should set the NotifyFilter property to as few changes as possible in order not to stress the buffer to much. You can also create multiple instances of the FileWatcher each monitoring the same directory configured for different change types. In that case the logic further down the stream will be more complex to merge all those events.

If you only need to watch some filetypes you can also set the filter property

Furthermore the property IncludeSubdirectories should be set if you want to monitor an entire drive or folder structure

Access Rights special case

Another strange aspect of the FileWatcher is that it won’t respect the access rights of the user registering the FileWatcher. Thus you can get events for files that your program is not even allowed to see. You should filter such events or at least be prepared to see them.

Using it with a Windows DFS based on Windows Server 2012 R2

I used it in a configuration which had multiple data servers behind a common root. It worked without any problem, except for the buffer overflows if lots of activity was going on.

I got events about DFSRPrivate folders. This is a typical case of the missing consideration of access rights. The application was not allowed to access those “internal” files and needed to skip them

Some events happened with temporary office files (when the users opens word- or excel-documents). I also filtered them out.

More information

  • http://blogs.msdn.com/b/winsdk/archive/2015/05/19/filesystemwatcher-follies.aspx
  • https://msdn.microsoft.com/en-us/library/system.io.filesystemwatcher%28v=vs.110%29.aspx
  • http://stackoverflow.com/questions/13916595/is-it-really-that-expensive-to-increase-filesystemwatcher-internalbuffersize

PHP-Script for sending personalized html newsletters

I searched for a simple solution to send some personalized html e-mails with embedded pictures. However I didn’t find one. Thus I created the following script:

It uses phpmailer and a simple csv-file to generate and send the emails. To execute it you just need to upload it to a server and execute it with:

To add new recipients you can open the file recipients.csv in Microsoft Excel or LibreOffice and add new rows. When you save it just select the CSV-Format and choose the tabulator character as deliminator.

The generated email in this example looks like this:

It can be freely adapted in the code. You just need to escape ” with \”

Download the code

Find out what library is missing Fedora 22

Sometimes some libraries are missing. This leads to the following error message:

It will only display the first missing library. So first you can use the following command to identify them:

This might lead to the following output, where you can see a list of all missing libraries:

The next step is finding out which library you need to install (32-Bit libraries in /usr/lib, 64-Bit libraries in /usr/lib64):

This will output something like:

If you want to install it you can omit the version number and the .fc22: