Perl Weekly Challenge 013

These weekly challenges always seem to open the door to further reading and research into interesting topics and this week was no different. Part 1 this week opened the door to the surprisingly deep body of work on algorithms to determine the day of the week. Most modern implementations trace themselves only as far back to 1992 when Tomohiko Sakamoto posted his algorithm to the UseNet group comp.lang.c. Part 2 led to some interesting reading on mutually recursive methods.

As with the previous weekly challenges the problem statements are short and are included in the first comment of the code. The code blocks  shown  link to GitHub Gists.

Part 1

Sample Run

$ perl perl5/ch-1.pl
2019/01/25
2019/02/22
2019/03/29
2019/04/26
2019/05/31
2019/06/28
2019/07/26
2019/08/30
2019/09/27
2019/10/25
2019/11/29
2019/12/27

I stuck with what seems to be the more or less usual implementation of the Sakamoto algorithm. In an effort to hopefully make the code a bit more readable I made liberal use of use constant. This is something I like to do and it often comes up in my code. I think the effects are generally positive, the extra verbosity never seems to hurt. In general I like to cut down on as many literal strings, ints, and floats whenever possible. These used to be referred to as "magic numbers" because if you saw them in someone else's code without explanation their meaning might be hard to explain ... simply magic! 

Collapse )

Installing AI::MXNet on OS X

I previously used the Perl MXNet bindings on a Deep Learning project a while back,  over a year ago now. I hadn't really used it much since after I changed roles at work. Recently I have been getting back into Deep Learning and so installed the most recent version of AI::MXNet for the Perl I am using now (5.28.0 via perlbrew). I use Macports to manage external packages. 

The official instructions are somewhat different than what I found I needed to do to successfully install everything so I thought it might be useful to publicly post my build notes.

If you are interested in building with GPU support (and you should be!) please also follow the instructions necessary to include the CUDA Toolkit. This can be considered optional, however, since you can get started with MXNet using just CPU.

Pre-requisites

CUDA Installation

Follow the instructions for downloading the CUDA Toolkit and CuDNN package. Just note that the location of your CUDA installation is going to be  like /Developer/NVIDIA/CUDA-10.X or something else which is somewhat different than shown in the instructions.

If you are using MacPorts you can, of course, safely ignore the Homebrew instructions both in this section and elsewhere. I have included the MacPorts equivalents where needed in this article.

You will also need to make sure that you use the clang compiler and not the GNU gcc/g++ compiler you might get from MacPorts or else you'll get an error: 

nvcc fatal  : GNU C/C++ compiler is no longer supported as a host compiler on Mac OS X.

You can fix this by modifying the following two lines in config.mk

export CC = clang
export CXX = clang

Since clang is a requirement for MacPorts this should be present. It's typically installed by Xcode and the accompanying Xcode command line tools.

The rest of these instructions do not explicitly cover building with GPU support.

General Requirements for building the MXNet library

sudo port install coreutils

sudo port install pkgconfig

sudo port install graphviz

sudo port install openblas-devel

sudo port install opencv

sudo port install swig

sudo port install swig-perl

Required for the Perl bindings

perl -MCPAN -e "install PDL"

perl -MCPAN -e "Mouse" 

perl -MCPAN -e "Function::Parameters" 

perl -MCPAN -e "Hash::Ordered"

perl -MCPAN -e "install PDL::CCS"

Building

My system [existing setup]

gcc version 9.1.0 (MacPorts gcc9 9.1.0_2)

perl-5.28.0 via perlbrew

Compiling

git clone --recursive https://github.com/apache/incubator-mxnet mxnet

cd mxnet

cp make/osx.mk ./config.mk

echo "USE_BLAS = openblas" >> ./config.mk

echo "ADD_CFLAGS += -I/opt/local/include" >> ./config.mk

echo "ADD_CFLAGS += -I/opt/local/include/opencv" >> ./config.mk   [Optional. See below about OpenCV]

echo "ADD_LDFLAGS += -L/opt/local/lib" >> ./config.mk

echo "ADD_LDFLAGS += -L/opt/local/lib/graphviz/" >> ./config.mk

make -j$(sysctl -n hw.ncpu)

Error!

Compilation failed initially with his error:

/Users/adamcrussell/Projects/mxnet/3rdparty/mshadow/mshadow/./base.h:162:14: fatal error: cblas.h: No such file or directory
162 |  #include <cblas.h>
| ^~~~~~~~~
compilation terminated.

I checked /opt/local/include and it seems that the file I want is called cblas_openblas.h on my system. I am not sure if this is a recent change to the OpenBLAS package or something specific to Macports. I modified base.h to use cblas_openblas.h and everything proceeded to completion without error.

OpenCV

I ended up commenting out the line in config.mk for OpenCV

# whether use opencv during compilation
# you can disable it, however, you will not able to use
# imbin iterator
# USE_OPENCV = 1
USE_OPENCV = 0

OpenCV can be an annoyance during the final linking of the library with the need to add lines like ADD_LDFLAGS += -lopencv_imgcodecs in config.mk to get everything working. I have never used the imbin iterator so don't think this will ever impact my work. If you feel like you need this, most likely because you are going to work on some sort of computer vision project, then take note of what I wrote about the need for extra LDFLAGS!

Post Build

If you build the MXNet library and then did not either move it to (or create a symlink to) /opt/local/lib or somewhere else your system expects to find libraries then you will absolutely need to set the following environment variable

export DYLD_LIBRARY_PATH=/Users/adamcrussell/Projects/mxnet/lib:$DYLD_LIBRARY_PATH

Obviously your location will be different. You may even want to add this to the bottom of your .bash_profile so it is always set for you. If this is not set and you do not have the library (or a symlink) in a standard library location you will see this sort of error

$ perl -MAI::MXNetCAPI -e 0

Can't load '/Users/adamcrussell/perl5/perlbrew/perls/perl-5.28.0/lib/site_perl/5.28.0/darwin-thread-multi-2level/auto/AI/MXNetCAPI/MXNetCAPI.bundle' for module AI::MXNetCAPI: dlopen(/Users/adamcrussell/perl5/perlbrew/perls/perl-5.28.0/lib/site_perl/5.28.0/darwin-thread-multi-2level/auto/AI/MXNetCAPI/MXNetCAPI.bundle, 1): Library not loaded: @rpath/libmxnet.so

Referenced from: /Users/adamcrussell/perl5/perlbrew/perls/perl-5.28.0/lib/site_perl/5.28.0/darwin-thread-multi-2level/auto/AI/MXNetCAPI/MXNetCAPI.bundle

Reason: image not found at /Users/adamcrussell/perl5/perlbrew/perls/perl-5.28.0/lib/5.28.0/darwin-thread-multi-2level/DynaLoader.pm line 197.

at -e line 0. 

Again, this is due to the Perl interpreter simply being unable to locate the mxnet library.

Up to this point we have only built the shared library with no language specific bindings. Now that the shared library is built you can finish everything off by building the Perl bindings and maybe some for other languages as well if they interest you.

Building the Perl Bindings

cd ${MXNET_HOME}/perl-package/AI-MXNetCAPI/
perl Makefile.PL
make
install_name_tool -change lib/libmxnet.so  ${MXNET_HOME}/lib/libmxnet.so  blib/arch/auto/AI/MXNetCAPI/MXNetCAPI.bundle
make install

cd ${MXNET_HOME}/perl-package/AI-NNVMCAPI/
perl Makefile.PL
make

install_name_tool -change lib/libmxnet.so  ${MXNET_HOME}/lib/libmxnet.so  blib/arch/auto/AI/NNVMCAPI/NNVMCAPI.bundle
make install

cd ${MXNET_HOME}/perl-package/AI-MXNet/
perl Makefile.PL
make install

Sample Run

You can do a simple test, just to make sure everything is installed correctly, a few ways. 

Interactive PDL shell

$ pdl
perlDL shell v1.357
 PDL comes with ABSOLUTELY NO WARRANTY. For details, see the file
 'COPYING' in the PDL distribution. This is free software and you
  are welcome to redistribute it under certain conditions, see
  the same file for details.
ReadLines, NiceSlice, MultiLines  enabled
Reading PDL/default.perldlrc...
Found docs database /Users/adamcrussell/perl5/perlbrew/perls/perl-5.28.0/lib/site_perl/5.28.0/darwin-thread-multi-2level/PDL/pdldoc.db
Type 'help' for online help
Type 'demo' for online demos
Loaded PDL v2.019 (supports bad values)

Note: AutoLoader not enabled ('use PDL::AutoLoader' recommended)

pdl> use AI::MXNet q/mx/;
pdl> print $arr = mx->nd->ones([1, 9])
<AI::MXNet::NDArray 1x9 @cpu(0)>

One Liner

$ perl -MAI::MXNet="mx" -e '$arr = mx->nd->ones([2, 7]); print $arr;'
<AI::MXNet::NDArray 2x7 @cpu(0)>

Test Script

The script calculator.pl is taken from Sergey Kolychev's blog. Sergey is the author of AI::MXNet so if you go to his blog please add a comment thanking him for all his hard work in bringing powerful modern Deep Learning abilities to Perl!

$ perl calculator.pl
Epoch[0] Train-mse=0.154705
Epoch[0] Time cost=0.105
Epoch[0] Validation-mse=0.003525
Epoch[1] Train-mse=0.001573
Epoch[1] Time cost=0.105
Epoch[1] Validation-mse=0.001020
Epoch[2] Train-mse=0.000495
Epoch[2] Time cost=0.104
Epoch[2] Validation-mse=0.000227
Epoch[3] Train-mse=0.000083
Epoch[3] Time cost=0.104
Epoch[3] Validation-mse=0.000010
.
.
.

Perl Weekly Challenge 012

This week's challenge had a great problem selection. Both parts had succinct easy to understand problem statements and yet they both allowed lent themselves easily to the variety of solutions styles of the sort Perl is famous for. Details below!

As with the previous weekly challenges the problem statements are short and are included in the first comment of the code. The code blocks  shown  link to GitHub Gists.

Part 1

Sample Run

$ perl perl5/ch-1.pl
E_6 = 30031 is composite.

For Part 1 I decided to not bother computing the primes myself. After deciding that I further thought that having a table of hard coded primes in the code was a bit boring and so I found a web site that had the table of many primes already computed. My code starts by grabbing that table of pre-computed primes (the first 10k) with a basic LWP::UserAgent and feeding them into an array. The page that is sent includes some header text but the primes themselves are arranged in rows of 10. To make sure we only include those numbers in our array we check to make sure that the split rows of input that have less than 10 columns are ignored (lines 21-23).

The rest of the solution proceeds pretty much as you'd expect with increasing sized array slices of the array of primes being used to compute successive Euclid Numbers and then checking to see if they are prime. One somewhat uncommon technique is the use of eval to compute the product of primes on line 29.

The first composite Euclid Number is printed if found and then the code immediately exits. If not, a message informing that none are detected is shown.

Part 2

Sample Run 

$ perl perl5/ch-2.pl
Path in common is /a/b.

Part 2 had some potential for an elaboratly over engineered solution if I felt so inclined. I think this might be a fun way to invlove Graphs, for example, as it was reminiscent of the previous Word Ladder challenge. I did not have the ability to dive into a deliberately over engineered solution this week and so stuck to basics!

The file separator is expected to be the first line of input. That is read in and from here the rest of the input is assumed to be paths to check. Each path is stored as an array reference containing the directories. This is done by splitting on the file separator (line 17). Note the use of quotemeta in order to make sure that the file separator is properly escaped during variable interpolation.

Paths are checked by removing the first directory from the first path and then comparing to the first directory in all the other paths. If they all match then that directory is saved and successive directories are checked until no directories remain or a non-match is detected.

Notes:

  • chop is not a very frequently used function but it is convenient for the sort of thing used for here (line 38) which is to remove the trailing file separator from the output. 
  • The last on line 35 is to make sure that we exit the outermost loop and do not continue to check directories.
  • This code (intentionally!) has a particular style to it that I would describe as an older C style of coding because of the way I wrote the for loops, the boolean flag $in_common, and the use of last. I think it's very readable and easy to follow, although it's different from what would be called idiomatic Perl.
  • A more general thought on code style: for a very long time now, at least twenty years, you really only hear about programming style when discussed in a negative context. For example, a company may have a specific style guide which enforces rules of how code should be written. Even more strictly, these style guides are many times enforced with code checking tools that examine code as it is checked into source code versioning repositories. This is somewhat depressing because before this era of style enforcement there was a view that code could be written in a way that could be described as beautiful in the same way that a work of fine art may be considered. Knuth's Literate Programming book comes to mind as an example of this school of though, in particular the very first chapter. I have yet to find a style guide for any programming language which yields code I find particularly attractive either in an aesthetic sense, or in any sense of elegance of implementation. The Perl programming community may very well be the last active area of Computer Science where code creativity and style has any appreciation.  

Perl Weekly Challenge 011

Both parts of this week challenge seemed straightforward. Details below!

As with the previous weekly challenges the problem statements are short and are included in the first comment of the code. The code blocks  shown  link to GitHub Gists.

Part 1

Collapse )

Perl Weekly Challenge 010

As you'll see below I took a bit of an over engineered approach to Part 1 by using  Parse::YYLex with byacc. That was a fun rabbit hole to dive into and something that I'll continue to explore even after this week's Perl Weekly Challenge concludes.

As with the previous weekly challenges the problem statements are short and are included in the first comment of the code. The code blocks  shown  link to GitHub Gists.

Part 1

Sample Run

$ ./perl-byacc1.8.2/byacc -d -P RomanParser roman.y
$ perl -I. ch-1.pl
MMI  
2001
CDIV
404

The code for this look unexpectedly short because all the work is being done by the lexer and parser! By running byacc (with the -d option) we generate the following files based on the input file roman.y

  • RomanParser.pm — the parser itself
  • y.tab.ph — numerical token list (required)

The file roman.y is the bulk of the effort for this part of the challenge. It is fairly large so check it out at the link above. It contains the actions to take for each Roman numeral generated by the tokens given to it from the yylex() function implemented within that file.

Part 2

After spending so much over engineering energy on Part 1 this second part of the challenge was much more straightforward! Well, somewhat. I was recently reading the Vandevoorde and Josuttis C++ Template book and had the idea of doing as much during compilation as possible stuck in my head! Whence the somewhat more prolific use of use constant than usual.

Perl Weekly Challenge 009

This week's Perl Weekly Challenge presented a problem I hadn't really thought of before: how to do a ranking or what is mathematically referred to as a weak ordering.

Part 1



Sample Run
$ perl perl5/ch-1.pl
First square with five distinct digits: 12769 (= 113 * 113)



I think this was straightforward enough of a problem with a straightforward solution: start iterating through integers until you find one that has the property we want. The number must have a square root of at least 100 since it has at least five digits so that is our starting point. Counting the unique digits (see line 12) is done using a clever trick suggested by Prakash Kailasa via a Gabor Szabo Perl Maven blog. The trick is to create an anonymous hash with the values from the array as the keys. This will remove the duplicates since hash keys must be unique. Originally this could be done without the %{} around the map {$_ => 1} because for a time keys() could be applied to a hash reference however this is no longer allowed in perl-5.28.0 (I think it was removed in perl-5.14.0?).

Part 2

Sample Run

$ perl perl5/ch-2.pl
Name Score Rank
NEH 2 1
FED 3 2
OKN 3 2
DRZ 4 4
NPK 5 5
XRK 5 5
WXV 7 7
QHC 9 8
IEE 10 9
CYP 10 9
================
NEH 2 1
FED 3 3
OKN 3 3
DRZ 4 4
NPK 5 6
XRK 5 6
WXV 7 7
QHC 9 8
IEE 10 10
CYP 10 10
================
NEH 2 1
OKN 3 2
FED 3 2
DRZ 4 3
XRK 5 4
NPK 5 4
WXV 7 5
QHC 9 6
CYP 10 7
IEE 10 7




This presented some nice opportunities for creativity. It's a standing rule, as far as I know, that if a challenge statement has any ambiguity than you can simply interpret it as you like. This leads to solutions having a variety of styles and flavors. The problem statement didn't suggest what in particular we should rank so I decided to do a ranking of basic Perl objects. These are stored in an array, sorted, and then sent to the ranking functions. The sorting and ranking functions were written as generically as possible to allow for any sort of object as long as it has an accessor that returns a single numeric value. A rare bit of explicitly polymorphic code in Perl.

I think that the ranking logic would have been a little less convoluted with a better choice of data structures. I think a doubly linked list (i.e. with forward and previous references) would have been best. Still, the Standard and Dense rankings did not cause too much trouble to implement. The Modified ranking was the most complex. A concise definition from Wikipedia is that, for the Modified ranking, each item's ranking number is equal to the number of items ranked equal to it or above it. But what about ties for first place? And multi-way ties? Based on the sample output shown I think I achieved all the goals of the modified ranking but this is a good example of having someone else check your work! Especially a domain expert, someone who may have worked with this ranking many times in the past. Some searches didn't reveal much use of the Modified ranking. I did find a suggestion on how to address ties for first in Modifed Ranking here. Overall documentation of examples implementing the Modified ranking was otherwise not turning up in my searches.

I did make use of a core module which I do not ordinarily use, Tie::RefHash. For this particular problem I found it convenient to use the object references as hash keys and to do so requires the use of this module.

There is nothing wrong with use Threads;

Fight me

In a previous article I showed how a significant performance improvement can be made by re-implementing key parts of your Perl code in C++. This was done as an example of improving a brute force computation of the first five perfect numbers. In my writeup of that initial problem I discussed how the best way to improve the speed of that computation was an algorithmic improvement, using well known mathematical relationships to compute the perfect numbers directly and not doing a large scale brute force evaluation of tens of millions of numbers. Still, the fact remains that substantial performance improvements can be made through either one or both of (a) parallelizing the problem and spreading the computation out to run concurrently in pieces and (b) implementing the most computationally intense parts of the code in a compiled language such as C++. (b) was discussed previously, as mentioned above, here we'll take a look at (a). Specifically we'll look at using the perl Threads module. First, though, let's discuss the use of the Threads module.

Threads in Perl

If you want to take full advantage of your hardware and divide computationally intensive tasks across all your CPU cores, and have your code be fully portable, then you need to use Threads. Perhaps the only other option would be to use an MPI implementation, but that introduces significant complexity. Ignoring portability you could you use a fork(), perhaps using one of several modules that makes using fork more convenient. If you choose to ignore having any meaningful benefit to dividing your computationally intense code into different threads of execution than there are modules such as Coro which provide co-operative threads. Co-operative threads share the same CPU so using them for a significant computation would be for cosmetic code organizing purposes only. For non-computational tasks they may provide some benefit beyond the merely cosmetic. For example, when reading from several data sources you can avoid reading them completely in sequence and instead read a little from each until all reads complete. Note that this mediocre sort of threading is as good as you will ever get if you are a Python developer. Well, at least if you use the main CPython implementation of that language. Jython makes full use of the JVM's very nice multi-threading abilities. Most Python developers don't know or care about this limitation because they are mostly just calling wrappers to powerful libraries written in C++ such as TensorFlow. Those libraries of course make full use of C++'s multi-threading as well as GPU APIs. 

Now, if you look at the Threads documentation you will find it's written in this extremely hand wringing tone all but begging you not to use them. This is ridiculous and the result of an absurd argument from people that are loud pedantic jerks mean well. The fact is that the Threads module is not implemented in the way an experienced programmer might expect and so it comes with a decent list of use cases which must be avoided. This is mostly because each thread is given its own Perl interpreter. Many all of these cases to avoid are fairly obscure such as "don't use END blocks in a thread". Yeah, sure. Thanks for the heads up.

Indeed I am not aware of any actual horror stories of the Threads module causing trouble in practice. Those that are aware of their limitations either work around them as necessary or write test cases in an effort to bemoan these limitations.

If you're breaking up a tough computation into smaller pieces to take advantage of multiple CPUs Threads provide a simple interface to unleash a lot of processing power. Just read the documentation first and you'll be fine. 

The Code

Original code but now threaded

This is the same code as used in the initial problem but with Threads. Getting a rough idea of run time with time we see that it takes about 78 minutes. That is coming down from the original take ~257m. So a little over a 3x speedup. 

$ time perl ch-1t.pl
6 28 496 8128 33550336 

real 77m58.142s
user 268m17.491s
sys 0m21.845s

We already saw that a substantial speedup is obtained by re-writing the factoring and perfect checking in C++. What if we combined that code with Threads?

Native Library + Threads

$ time perl -Iext -Iext/blib/arch/auto/Perfect ch-1xt.pl
6 28 496 8128 33550336 

real 5m40.543s
user 20m6.924s
sys 0m4.594s

Great! We are now able to find the first five perfect numbers with brute force in a little over 5½ minutes. That's a roughly 45x speedup from the original pure (single threaded) perl code.

Notes:

  • The number of threads is set to be the number of cores available on my (Late 2012) iMac. A supercomputer it is not.
  • The RANGE_SIZE is set to a number that I determined experimentally from a few runs to best reduce the number of wasted evaluations but still put the threads to best use. Remember, these are ithreads and so they don't come cheaply. A new interpreter is created for each one. Angry pedants would scold me for not having creating a thread pool and re-used threads. Also, we're going to end up evaluating numbers that we don't need to because a previously created thread will have found the last perfect number while other threads are still running. I'd argue this is a reasonable tradeoff and when looking for larger and larger perfect numbers not that bad, they do occur very sparsely after all. As a reminder: we're only using this perfect number search as a MacGuffin to try out some different methods. If we were really holding ourselves to the mathematics of perfect numbers we'd have computed them directly and have been done with it!
  • It's possible that the perfect numbers would be printed out of numerical order as the threads complete. For the first five this is not really a problem since the first thread created should usually finish well before the last one given the distance between these numbers. Remember, ultimately these threads get executed at the pleasure of your OS and on a busy system unexpected things may happen!
  • The time command gives us a decent rough idea of execution time but it is no substitute for closer benchmarking of code if we're really interested in identifying any particular bottleneck.
  • This is what the core utilization looked before, during, and after a run of ch-1xt.pl. (Ignore the blank to the left which accounts for the time before measuring started) The third and fourth cores seem a but underutilized before being tasked with this computation. You paid for all those cores so use them! use Threads!

Summary

This is the final chapter in a story that started by taking a naive computational approach to identifying perfect numbers and improving it first by re-implementing parts in C++ and then by introducing Threads. Both lead to noteworthy improvement in performance, in fact, a very significant improvement when taken together. 


Integrating C++ and Perl with SWIG

In another article I mentioned that it's possible to call code written in C++ from Perl. Perl offers two ways to do that:

  1. Perl's XS (eXternal Subroutines)
  2. SWIG

Here SWIG will be used. I like SWIG better than XS because it is somewhat more straightforward to integrate with C++ (XS is much more C-centric) and also because the steps necessary to wrap C++ code for use by Perl can be readily modified to work with other languages whereas XS will only let you integrate with Perl. In other words, if you have a significant C++ library that you would like to make available to a variety of programming languages SWIG is your best bet!

This came up when discussing the solution to one of the Perl Weekly Challenges in which the problem was to computer Perfect Numbers. The issue is that Perfect Numbers occur very sparsely and a brute force method to find them when written in even a fast interpreted language such as Perl will take a long time as it searches through many millions of numbers. The preferred solution to speeding up that computation is to use a little mathematical sophistication and use one of the several well known methods for computing Perfect Numbers directly. Algorithmic improvements lead to the greatest performance gains.

Often times, however, in practice a significant algorithmic improvement is not forthcoming or the time to spend researching it is disallowed by, say, project schedules. In these cases what we may want to do is write the computationally intensive parts of the code in a compiled language such as C++ and call it from our Perl code. This will allow the use of Perl to more rapidly develop the system as a whole while, as needed, certain parts are re-written in a way that improve overall performance.

The original code is shown in the previous article. That script, written only in Perl took nearly four and half hours to find the first five Perfect Numbers. 

$ time perl perl5/ch-1.pl    
6 28 496 8128 33550336 

real 257m33.621s
user 255m52.169s
sys 0m24.603s

Let's see what speedup we get from re-implementing the most computationally intense parts of the code in C++.

Overview

First be sure to install swig! Otherwise it's assumed you already have a C++ compiler and Perl installation.

I am using OS X 10.14 "Mojave" with perlbrew installation of perl-5.28. I use macports and have installed gcc9.

The swig installation is done in two parts: 

    $ port install swig
   $ port install swig-perl

Other systems, such as Linux, will have similarly named packages available.

I am using swig 3.0.12. swig 4.0 was released a few weeks prior to this writing but is not yet available on macports. That is fine, none of the latest features are necessary.

Also, I will save the extra issues related to creating a distributable module (e.g. suitable for uploading to CPAN) for a later article. 

Required Project Files

  1. All the above files are linked to GitHub if you'd like to download the code.
  2. Other files will be generated by swig but these are the ones that we must create.

Building (the easier way)

  1. swig -c++ -perl perfect.i  generates perfect.pm and perfect_wrap.cxx. perfect.pm is the Perl side of your new module. perfect_wrap.cxx is, as the name implies, the native wrapper side of your module. Both of these files are generated based on what you've put into your interface file (e.g. perfect.i). If you want to modify these files do so by changing your interface file!
  2. make Makefile.PL this will create a Makefile. This step is optional in the sense that you can definitely compile and link everything yourself but MakeMaker does some nice extra stuff, such as create all the necessary perl packaging files. It's amazing how convenient this is!
  3. make assuming you ran the command in the previous step this will compile and link the C++ code to a dynamic library as well as package it. At this point you might consider yourself done.
  4. make install this will install your new module in a location that Perl can easily find it for your use. This is definitely optional. While still developing your module you'll likely want to leave it in a local directory for testing.

Note that we didn't create any tests in this basic use case. That should be something you add in later and when you do be sure to run a make test before you install to make sure everything is OK. 

Also, none of the above is concerned with distribution of your module. To do so requires a little more packaging which we will discuss later. For now we are focussing on building everything up to a usable state for a single developer.

Running

After going through all the above you will have a directory that looks something like this.

MakeMaker did us quite a favor in creating several additional files and directories. To run the script ch-1x.pl from within this directory you would execute as shown next. The -I arguments are necessary to tell perl where to find the perfect.pm (Perl module) and perfect.dylib (OS X dynamically linked library) files.

Sample Run

perl -I. -Iblib/arch/auto/Perfect ch-1x.pl
6 28 496 8128 33550336 

That seemed to run quicker. But I wasn't watching closely. What does time tell us?

time perl -I. -Iblib/arch/auto/Perfect ch-1x.pl
6 28 496 8128 33550336 

real 19m31.171s

user 19m24.311s

sys 0m2.196s

That's great! That is about 14x faster than the pure Perl version! Again, for this particular problem this is not the method for getting the absolute best performance but it definitely shows that some great performance gains can be achieved by re-writing computationally intensive parts of your system in C++ vs plain Perl.

Details

Let's take a deeper look at what we did, starting by examine the files required for this project.

Required Project Files

We are declaring a pretty basic class. A constructor, destructor, and single method isPerfect().

We don't do much with the constructor or destructor, the code is really all in the isPerfect() method which is a very close translation of the previous pure Perl program.

The syntax of the interface files is a little unusual but this is small enough to not be too confusing. We declare that we are creating a module called Perfect. According to the swig docs The %{ %} block provides a location for inserting additional code, such as C header files or additional C declarations, into the generated C wrapper code. So that is why we need to include our header file perfect.h there, so it gets included in the generated perfect_wrap.cxx file. The seeming duplicate mentioning of perfect.h in the line %include "perfect.h" is not actually redundant. The %{%} block includes that header in the wrapper but what comes later is what is parsed by swig to actually generate the wrappers. In this smaller example they appear redundant but in larger more complex examples the different sections of the interface file become more distinct.

Here we actually get to put all our work to use! Notice how we use Perfect; and declare and use a Perfect object.

  • A MakeMaker file (Makefile.PL) [somewhat optional but highly recommended]

The contents of this file are small, just state your module name and the object files output by the compilation step and you should be good to go.

This is "optional" because it is possible execute the commands to build and link everything yourself, either in a Makefile you write or just executing the right commands in sequence. By using MakeMaker you get all the extra packaging files such as the blob directory and meta files with no extra effort. You also get the ability to easily install the new module on your system. By using MakeMaker you are probably 80% of the way to having created a distributable module, suitable for inclusion on CPAN.

For the sake of completeness and a slightly fuller understanding here is what doing the build manually looks like ...

Building (the harder way)

$ swig -c++ -perl perfect.i

$ g++ -c perfect.cxx perfect_wrap.cxx `/Users/adamcrussell/perl5/perlbrew/perls/perl-5.28.0/bin/perl -MExtUtils::Embed -e ccopts`

$ g++ -dynamiclib -single_module -flat_namespace -undefined suppress -o perfect.dylib perfect.o perfect_wrap.o -L /Users/adamcrussell/perl5/perlbrew/perls/perl-5.28.0/lib/5.28.0/darwin-thread-multi-2level/CORE/ -lperl

Sample Run 

$ perl -I. -MPerfect -e 'my $p=new Perfect::Perfect();print $p->isPerfect(6);'
1

Just what we hoped for, a true value returned. We can now confidently run             ch-1x.pl as we did before.

We use the -I. to make sure that the interpreter sees perfect.pm and perfect.dylib in the current directory. If they occur in other locations than be sure to specify that with -I. In earlier examples you may have seen -Iblib/arch/auto/Perfect for this same reason.

Clearly this was done on a Mac OS X system and is very specific to my particular configuration with perlbrew. You'll need to adjust accordingly. In particular on Linux systems, not using perlbrew, for example your last line might look more like

g++ -shared perfect.o perfect_wrap.o -L /usr/lib/ -lperl -o perfect.so

Summary

This article has shown the construction of a Perl module with portions written in C++ and integrated with SWIG. The example module is small and straightforward but demonstrates the main capabilities of SWIG. More complexity comes into play when using advanced C++ features such as templates and nested namespaces, however SWIG is able to handle these well and the integration of more advanced C++ code and Perl may be the subject of a future article.


Perl Weekly Challenge 008

As with the previous weekly challenges the problem statements are short and are included in the first comment of the code. The code blocks  shown  link to GitHub Gists.

Part 1

This is straightforward enough of an implementation, without much code that requires a lot of explanation. For each candidate number we compute the factors and then add them up. If the sum of the factors is equal to the number itself it's perfect, print it and continue the search. I am re-using the pack/unpack trick to do the summing, like I have done before.  So what could go wrong!?!?

Look at the sample run below though. I executed it use the time command. time is a quick way to see just how long a command takes to execute and here we see that to find the first five the above code takes nearly four and half hours!

Sample Run

$ time perl perl5/ch-1.pl    
6 28 496 8128 33550336 

real 257m33.621s
user 255m52.169s
sys 0m24.603s

The reason for this long run time is that Perfect Numbers become extremely sparse very quickly. The first four are typically found within seconds but the fifth takes the rest of the time because 33,542,208 numbers must be checked before it is found. Even a fast interpreted language like Perl running on modern hardware will take a long time to examine so many numbers. So how to make things faster? There are three ways:

  1. Algorithmic improvement
  2. Parallelization
  3. Write the computationally intensive parts in a compiled language such as C++. 

The answer for this problem is definitely (1). Useful properties of Perfect Numbers are well documented. A good web page to reference would be http://mathworld.wolfram.com/PerfectNumber.html. There you would find that you can compute Perfect Numbers in several direct ways due to properties related to Triangular Numbers, Mersenne Primes, and Hexagonal Numbers. The compute time to find the first five would be at most a few seconds even when written in an interpreted language such as Perl when running on contemporary hardware. 

Even though (1) is the best answer for this and most any performance problem you may ever encounter (2) and (3) are worth discussing because in many practical situations they offer the best solution (usually in terms of time and money!).

(3) is discussed in detail here. The results are that we can find a ~14x speedup, from 257m33.621s to 19m31.171s.

A combination of (2) and (3) is discussed here and results in a ~45x speedup, from 257m33.621s to 5m55.348s.

Part 2

Sample Run

Details

The logic used here is that we must first find the longest line. Once we determine the longest line find the midpoint of that line. The midpoint of the longest line will be where we want to center all the other lines. The padding needed is computed as the difference between the middle of the longest line and the middle of any other line. In the case of the longest line itself no special handling is needed, its padding length is zero. I use the same eval() and tr() method to find line lengths as I have done before.

Perl Weekly Challenge 007

As with the previous weekly challenges the problem statements are short  and are included in the first comment of the code. The code blocks shown  link to GitHub Gists.

Part 1

I think this is straightforward enough, with the exception of the use of pack and unpack.

Sample Run

$ perl ch-1.pl
1 2 3 4 5 6 7 8 9 10 12 18 20 21 24 27 30 36 40 42 45 48 50 54 60 63 70 72 80 81 84 90 100 102 108 110 111 112 114 117 120 126 132 133 135 140 144 150 152 153 

pack/unpack

The line unpack("%32C*", pack("C*", @digits)) is a bit mysterious looking at first so let's unpack it (pun intended!) .

We are first, in the inner pack() call, creating a binary representation of the @digits array according to a template.

  • The 'C' means to break the input into  unsigned chars
  • the '*' means repeat the 'C' for the rest of the input.  

    The template "C*", in other words, means "treat every item in this array as an unsigned char" and then pack them together. The result is a binary representation which, to be quite honest, means just about nothing unless you know, or are able to guess, the template.

So then we use unpack()'s checksum function to sum the digits, which is really all we are after here anyway.

  • The '%' is just telling unpack to use the special checksum function
  • The 'C' means to break the input into  unsigned chars (because that is how they were packed to begin with) 
  • the '*' means repeat the 'C' for the rest of the input (because that is how they were packed to begin with) 
  • The 32 means to compute a 32-bit checksum

The two functions share the same template syntax which is documented on the pack manpage.

More on binary formats

If you are curious just what the raw binary created by pack() looks like you can use the xxd command with the -b option. For example, suppose I ran a command like the following:

perl -e '$i="[A]2019";@d=split(//,$i); print pack("C*", @d);' > /tmp/pack

can you guess what A is based on this output from xxd?

$ xxd -b /tmp/pack
00000000: 00000010 00000000 00000001 00001001                    ....

If you find this figuring out of the meaning behind binary strings interesting than perhaps you have a future as a reverse engineer! If not, than you are a normal well adjusted person. :D

Part 2

Sample Run

$ perl ch-2.pl
cold -> cord -> card -> ward -> warm

Overview

To compute a word ladder we can calculate the shortest path between the two words, as represented by vertices, on a graph. To create the graph                              sub build_graph() creates a vertex in the graph for each word and does a pairwise comparison between all words. If they differ only by one letter then an edge is drawn between the two vertices. Once the graph is created we perform a Dijkstra Single Source Shortest Path computation which will compute all shortest paths from a given source vertex to all other vertices. This may seem wasteful, we are only concerned with one such path after all, right? Interestingly, the worst case time complexity is the same whether we abort the algorithm after finding the path we want or keep going and compute them all. There is some wasted calculation of course but, asymptotically speaking, we don't need to be too anxious about it.

The code shown above gets a little dense in place although I think at a high level the approach is not too hard to comprehend.

  1. Create the graph. I use the Graph module from CPAN. 
  2. Select a source word (vertex). Compute all shortest paths from this source.
  3. return the path we are interested in for our word ladder.

Details

As mentioned before some parts of the code get a bit dense. This is not out of some intent to be purposefully obfuscated! As always with Perl There Is More Than One Way to Do It and the way I chose fits a balance between readability and still not being overly verbose. I think.

In sub build_graph these lines

 my $length_w0 = do{    
   $w0 =~ tr/[a-z]//;           
};

compute the string length of $w0 by using the return value from tr which is the number of matches made. The same technique is used a few lines down with

my $differences = eval "\$w1 =~ tr/$w0//";

but here we need to wrap the tr in an eval because tr does not perform any variable interpolation since its transliteration table is built at compile time and not run time.

In sub dijkstra_sssp the lines 

local *by_total_edges = sub {
   return 1 if $total_edges{$a} eq OO;
   return -1 if $total_edges{$b} eq OO;
   return $total_edges{$a} <=> $total_edges{$b};
};

define a nested subroutine. The need for the use of local is to avoid creating a closure. The perlref manpage describes the situation: ... named subroutines do not nest properly and should only be declared in the main package scope. This is because named subroutines are created at compile time so their lexical variables get assigned to the parent lexicals from the first execution of the parent block. If a parent scope is entered a second time, its lexicals are created again, while the nested subs still reference the old ones.

Anonymous subroutines get to capture each time you execute the sub operator, as they are created on the fly. If you are accustomed to using nested subroutines in other programming languages with their own private variables, you'll have to work at it a bit in Perl.  The intuitive coding of this type of thing incurs mysterious warnings about "will not stay shared" due to the reasons explained above.

So by using local what we are doing is creating a temporary (i.e. re-created for each call to the enclosing subroutine) assignment of the anonymous subroutine. This has normal access to the lexical variables from the enclosing scope at the time it is is invoked.

This has the interesting effect of creating a function local to another function, something not normally supported in Perl. I'll freely admit this is definitely not idiomatic Perl. I'd argue that stylistically, in any language, this should be the preferred way of organizing this code. The small function used for sorting vertices has no use outside of sub dijkstra_sssp. Indeed I could imagine many programmers just inlining the code in the call to sort! This use of local is essentially the same thing, but cleaner looking than inlining. The only complication is that perl will complain with

Name "main::by_total_edges" used only once: possible typo at perl5/ch-2.pl line 41.

unless we add the line no warnings "once"; I am still investigating this. At face value the warning makes no sense: by_total_edges is called from within a loop. Is this a slight defect in the interpreter to not recognize this? I'll write up what I learn about this another time.

Notes

  • If you thought the use of the Unicode character '∞' was unnecessary you are correct! I was hoping that it would be an acceptable identifier and that line could have read use constant ∞ => "INF"; or something like that where I could then have used in the code almost like a numeric literal. Sadly this is not an acceptable identifier. (Although I think it should be!)
  • The Graph module has an interesting history, it was originally written as part of the code samples for the book Mastering Algorithms with Perl. That book has served as an excellent reference for many years.