Normalizing a URL with Parse::Yapp

For Perl Weekly Challenge 017 I used Parse::Yapp to detect and print the components of a URL. In order to normalize a URL much of the same code can be re-used to parse the URL with he difference being rather than printing the components applying the various normalization rules.

The Code

Wrapper

All the logic is really in the Grammar file which is called by the small wrapper program.

Collapse )

Implementing a spigot algorithm for the digits of e

For Perl Weekly Challenge 004 I showed how the digits for π could be generated using a so called spigot algorithm which generated the digits one at a time, as if the were flowing like water from a spigot. These algorithms were first described not for π but for Euler's number e in 1968 by Sale[1].

The algorithm

Sale's paper is linked to below and, unlike a lot of academic articles, is freely available. The paper is two pages long and is pretty digestible. Clearly it is a product of its time (1968) but the main points are timeless. The core idea is that if you examine the standard computation for e 

e = 1/0! + 1/1! + 1/2! + 1/3! + ...

you see that after the second term all subsequent terms are less than 1. If you pick the number of digits you want ahead of time you can iteratively compute each term (starting from the "right side") and multiply it by 10 to generate an integer part, which is the digit you seek, along with the remainder which is used to compute the next digit. Each digit is computed in this way.

Both the Perl 5 and Perl 6 implementations proceed roughly the same way.

  1. The number of terms required ($m) are computed using logarithms to ease computation as done in the paper.
  2. The initial list of values for each term is set to be all 1s.
  3. The computation of digits is performed in a loop with each digit stored for output. How this is handled is the only significant difference between the Perl 5 and Perl 6 implementations. In Perl 5 the digits are saved to an array which is returned and then printed. In Perl 6 they are sent to a channel and printed in batches. That is, my Perl 5 code computes the digits, prints them and exits while my Perl 6 code runs in an infinite loop continuously printing digits.

Perl 5

Sample Run

$ perl perl5/ch-1.pl 50
2.71828182845904523536028747135266249775724709369995

Perl 6

Sample Run

Discussion

Once there Perl 5 version was done I thought it would be worthwhile to try something different in the Perl 6 version. How about creating a true spigot! Meaning, one that runs continuously. Sale's algorithm is not perfectly suited for this because we are starting "from the right" but since we want to create an infinite generator we need to do a lot of wasted computation. For each computed batch we only print out the net new terms.

The use of gather loop is the construct used to create the infinite generator, along with the take, which checks to see if the channel is empty in which case it computes the next batch of 50 digits. The choice of 50 here was arbitrary.

References

[1] A. H. J. Sale, The Calculation of e to Many Significant Digits, The Computer Journal, Volume 11, Issue 2, August 1968, Pages 229–230

entier(s)

I was reading an older (1968) computer science paper [1] and came across a term I had never seen before. The word is entier and appeared in the excerpted text shown below.

The Computer Journal, Volume 11, Issue 2, August 1968, Pages 229–230
The Computer Journal, Volume 11, Issue 2, August 1968, Pages 229–230

The definition of entier is "the greatest integer not exceeding the specified number" or, more simply, "the integral part of a number".

I have only ever seen this described as the "integral part" or "integer part" which would clearly seem to be the preferred modern terminology. I wonder when entier fell out of use? Perhaps it was even considered archaic even in 1968?

References

[1] A. H. J. Sale, The Calculation of e to Many Significant Digits, The Computer Journal, Volume 11, Issue 2, August 1968, Pages 229–230

Perl Weekly Challenge 020

This week I wrote some Perl 6 code for the first time. I did a pretty straight lift of the Perl 5 code, which I wrote first, into Perl 6. When Perl 5 code uses a lot of perl idioms we call it perlish or perly. Should we call Perl 6 that follows that languages idioms something else? In honor of the Rakudo compiler Rakish perhaps? I only bring this up to have an easy way of saying that by simply translating Perl 5 to Perl 6 that my code is probably not particularly rakish. Frankly, probably not that perlish either!

Although not necessary, I decided from the start to use Threads for Part 2. This will allow the easily parallelizable elements of the code, the factoring of many numbers, to be split across multiple CPU cores. In my Perl 6 code I used essentially the same approach but with Promises.

Finally, I'll note that, like last week, for this week's challenges I was able to re-use some code from a previous week. The use of pack/unpack was used in Week 7 and the factoring code is from Week 8.

Part 1

Perl 5

Sample Run

$ perl perl5/ch-1.pl AABBBCDDE
AA
BBB
C
DD
E

Perl 6

Sample Run

$ perl6 perl6/ch-1.rk AABBBCDDE
AA
BBB
C
DD
E

Notes On Part 1

The Perl 5 and Perl 6 version look essentially identical! The only differences are extremely minor: the command line arguments in Perl 6 are in an array named @*ARGS and in order to split a string into a list we need to pass Perl 6 split an empty string which then does pretty much as expected but also adds an empty string at the start and end of the list which needs to be accounted for. 

Part 2

Perl 5

Sample Run

$ perl perl5/ch-2.pl
First amicable pair of numbers: 220 284

Perl 6

Sample Run

$ perl6 perl6/ch-2.rk
First amicable pair of numbers: 220 284

Notes On Part 2

The approach here is to factor and sum the factors of batches of numbers, each batch handled by independent workers, either a separate thread or Promise. When these workers are completed their results are added to a hash of numbers and their factor sums and this hash is then reviewed for amicable numbers. When the first one is found the loop, and the program itself, exits.

The Perl 5 and Perl 6 version again look similar. Promises offer a similar interface to creating a thread. Although differing in many way for all practical purposes the same effect is achieved, computation is split across the available CPU cores, as verified by my Mac's CPU monitor.

Activity across all cores.
Activity across all cores.

The second and third cores are mostly idle but during the brief run of ch-2.rk a quick glance at the CPU monitor verifies their use. In a similar way you could also check out the moar VM's usage in top.

References

I highly recommend Andrew Shitov's Perl 6 at a Glance. This book is my primary reference for Perl 6 and is an excellent resource to have available to you if you already know Perl 5 and would like to get started with Perl 6. 

Perl Weekly Challenge 019

For this week's challenges I was able to re-use some code from a previous week. The "day of week" calculation using the Sakamoto method from Week 13 got to be re-used for Part 1. 

Part 1

Problem Statement

Write a script to display months from the year 1900 to 2019 where you find five weekends.

Solution

Sample Run

$ perl perl5/ch-1.pl
1901 March
1902 August
1903 May
1904 January
1904 July
1905 December
1907 March
.
.
.
2016 July
2017 December
2019 March

Discussion

The code above is cropped to fit better here. What is missing are the use constant declarations for the months and days of the week. I like using named constants for these sorts of values, it makes the code much more readable I feel. The only downside is that it makes the code slightly longer. The problem statement defined a weekend as Friday, Saturday, and Sunday so the essence of my approach was to iterate over every day of the month and save those three days to an array references stored in a hash keyed by year and month. When complete if the number of days in the array reference is divisible by five then we know that month has five complete weekends. Any month found this way is printed out.

Part 2

Problem Statement

Write a script that can wrap a given paragraph at a specified column using the greedy algorithm approach.

Sample Run

$ perl perl5/ch-2.pl
A simple way to do word wrapping is to use a greedy algorithm that puts as many words on a line as possible, then moving on to the next line to do the same until there are no more words left to place. This method is used by many modern word processors, such as OpenOffice.org Writer and Microsoft Word. This algorithm always uses the minimum possible number of lines but may lead to lines of widely varying lengths.

Discussion

The line length is set as a constant and the text to be wrapped is kept in the data section. The text itself is the definition of the greedy algorithm approach from wikipedia. Words are put in an array and printed out by The Price is Right rules: the sum of words on the line can be lesser than or equal to, but not more than, the maximum line length.

Perl Weekly Challenge 018

So much use of Class::Struct!

I never really used Class::Struct before. When I needed a simple object Perl's basic object system was always fine and when I needed a slightly more complex object Perl's basic object system is always fine. I am no stranger to how other languages organize their object oriented constructs and something in this week's challenge made me recall the use of C structs. Once I started using Class::Struct in a way that made sense I started to see just how much I could do with this convenient module. The code that follows has an interesting, not particularly Perlish look to it. Almost a Java like style because of what might be called "inner classes" (inner structs?).

Part 1

I used what are called Suffix Arrays to solve this week's Part 1. Suffix Arrays are a significant topic and I will have to defer and in depth discussion of their use for another time. In particular, my implementation SuffixArray.pm is a lengthy file which is not easily included in this blog so I am only linking to it from here. 

Sample Run

$ perl -I. -MSuffixArray ch-1.pl ABCAA BABC
ABC

Part 2

ch-2.pl
ch-2.pl
Collapse )

Perl Weekly Challenge 017

Parsing with Parse::Yapp and using the Bhagavad Gita API.

Back in Week 10 I decided to handle Roman numerals with a byacc based parser.
This week I used a very similar but slightly less cumbersome approach and used Parse::Yapp to parse and print the components of a URL.

Also, while I do not usually take the extra time to do the optional API challenges this week I felt like maybe I should give it a try.

As with the previous weekly challenges the problem statements are short and are included in the first comment of the code. The code blocks  shown  link to GitHub Gists. (I prefer to use the gists rather than linking to the code in the main repo since it seems that the gist URLs are more permanent. The main repo is a fork which might get deleted or substantially re-organized at some point.)

Part 1

Sample Run

$ perl perl5/ch-1.pl
4

Discussion

This part of the week's challenge was a nice reminder of recursion in Perl, something which I have spent a lot of time on in the past but haven't done much with recently. I also started to have the main part of the script in a named block, as is done by fellow weekly challenge participant Athanasius. I am in the habit of marking this section of code with a big multiline comment but after looking at some of his code recently I really like this approach!

Part 2

Sample Run 

$ perl perl5/ch-2.pl
SCHEME: jdbc
USER: user
PASSWORD: password
HOST: localhost
PORT: 3306
PATH: /pwc
QUERY: profile=true
FRAGMENT: h1

Discussion

Those code above is, of course, really just calling the functions defined in UrlParser.pm, which is generated by yapp based on UrlGrammar.yp. The steps involved are:

  1. Define a grammar in UrlGrammar.yp. Unless you really prefer writing your lexing functions elsewhere include them in this file as well.
  2. Execute yapp -m UrlParser UrlGrammar.yp. This generates the parser and saves it to a file named UrlParser.pm. Specifying an output filename with the -m option will allow you to give the generated parser a name which may make more sense then the default which in this case would be UrlGrammar.pm. UrlGrammar.yp is shown below.
  3. Write a wrapper to the parser, as I have done in ch-2.pl. Mine here is very small and straightforward since we're just parsing one string. Larger projects will need more code to, say, stream larger bodies of text.
  4. Optional. Execute yapp -v UrlGrammar.yp. This generates UrlGrammar.output which includes the rules and states used by the parser. This also shows any potential parser conflicts. Reviewing this file may be helpful.
  5. Optional. Use UrlGrammar.output to visualize the grammar with GraphViz::Parse::Yapp

The grammar above was informed both by the problem statement in the weekly challenge and also a loosely similar URL grammar that I found.


Script to generate the grammar visualization.
Script to generate the grammar visualization.


Visualization of the grammar.
Visualization of the grammar.

If you review the code (and even just the diagram above!) a little closely you can see that this approach was somewhat an overkill for this problem. The URL components are essentially all determined in lexer(). That function can be described as gnawing away at the URL. That is it works from left to right, matching what it can, removing what has been matched, and then repeating the process until no input remains. Just by looking at the diagram of the grammar you can see that each of the components are considered tokens expected to be passed in from lexer().

The defined grammar itself is really just providing a framework to keep everything organized and conveniently print the components as they are recognized. 

An alternative implementation might just have much of the same code as in the lexer() function on its own or attempt to use a single monolithic regex. I would argue that the approach I took is straightforward, very maintainable, and might be more easily extended if we wanted to go beyond just parsing any single URL at a time.

For anyone interested in exploring parsers in depth, from a utilitarian perspective, I highly suggest Pro Perl Parsing. That book contains a great overview of parsing in general and also explores several specific tools, including Parse:::Yapp.

Part 3

Sample Run

$ perl perl5/ch-3.pl 18 61
O Arjuna, the Lord resides in the region of the heart of all creatures, revolving through Maya all the creatures (as though) mounted on a machine!

Discussion

I haven't been taking the extra time for the optional API challenges, however, this one changed my mind. As a practitioner of bhakti-yoga in the Gaudiya Vaishnava tradition I took this challenge as a sign!

The Bhagavad Gita API requires you to register yourself and then once you re logged in to enroll individual applications which will use the API. I created a ingle application which I called PerlWeeklyChallenge17. After doing this I was given a client id and client secret. These are then used to obtain an authorization token which must accompany each API call. Authorization tokens are good for 300 seconds. In my code above I am not tracking the token lifetimes, however.

I really enjoyed this part of the challenge. There is a lot of room to really have fun with this. Tracking token lifetimes, as mentioned above, is an obvious first enhancement. Another might be to add flexibility in displaying the original texts, not just the translations which I chose to do for the sake of simplicity. 

Finally, I am unsure of the sources used by the Bhagavad Gita API. I would suggest that for any serious reading one use Bhagavad-gītā As It Is.

Perl Weekly Challenge 016

As with the previous weekly challenges the problem statements are short and are included in the first comment of the code. The code blocks  shown  link to GitHub Gists. (I prefer to use the gists rather than linking to the code in the main repo since it seems that the gist URLs are more permanent. The main repo is a fork which might get deleted or substantially re-organized at some point.)

Part 1

Sample Run

(shown are the person, amount eaten, and amount remaining)

1 1 99
2 1.98 97.02
3 2.9106 94.1094
4 3.764376 90.345024
5 4.5172512 85.8277728
6 5.149666368 80.678106432
7 5.64746745024 75.03063898176
8 6.0024511185408 69.0281878632192
9 6.21253690768973 62.8156509555295
10 6.28156509555295 56.5340858599765
11 6.21874944459742 50.3153364153791
12 6.03784036984549 44.2774960455336
.
.
.
99 9.23929532895044e-39 9.33262154439444e-41
100 9.33262154439444e-41 0

Discussion

To simplify things I standardize the size of a single pie to be 100. 100 what? Let's call them pie units (not π!). 

Collapse )

Perl Weekly Challenge 015

Sometimes I work on these challenges a little bit at a time over the course of several days. Other times I do the work all in one shot in the dead of night right before the deadline after getting home from a long vacation. That is what happened this week and something similar happened last week as well. As a result I think my code has a certain ... look to it. As of this writing I have a few hours to finalize and clean up the code but I think I'll leave things as is as a reminder of the time and space in which the code was written!

As with the previous weekly challenges the problem statements are short and are included in the first comment of the code. The code blocks  shown  link to GitHub Gists. (I prefer to use the gists rather than linking to the code in the main repo since it seems that the gist URLs are more permanent. The main repo is a fork which might get deleted or substantially re-organized at some point.)

Part1

Sample Run

The Code

I re-used a substantial portion of the code for retrieving the pre-computed list of primes from my solution to Challenge 012. That part seems reasonable enough I suppose. The find_weak and find_strong functions are very straightforward, I don't think I would change anything there. Looking at how I shift primes out of the main list and create the test arrays seems a little clunky to me after some further review. Like I said in the introduction I'll leave it as I wrote it late at night but if I had more time I would have liked to have used threads (maybe even just co-operative threads) to organize the code differently and have the list of weak and strong primes filled independently. Having them both filled at the same time and then having to check to see if one is already full is, again, a little clunky for my tastes.

Part 2

Sample Run

$ perl perl5/ch-2.pl
ATTACKATDAWN is encoded as pxklroreseny
PXKLRORESENY is decoded as attackatdawn

The Code

The encode() and decode() functions are come almost directly from the definition of the methods where encoding is C_i = M_i + K_i mod 26 and encoding is
M_i = C_i - K_i mod 26 where C_i is a character of encrypted text (cypher text) M_i is a character of message text (plain text) and K_i is a character of the keyword. The example of ATTACKATDAWN is a classic and so I use it here to demonstrate that everything works as expected.

Extra Information 

The Vigenère cipher, although it seems simple to a modern audience, was considered highly secure up through the 19th century. Successful attacks are mainly based on letter frequency analysis and determination of the keyword length. In particular the Kasiski Attack is an interesting case study in that it can be used against the Vigenère cipher and is representative of more sophisticated techniques. Perhaps an attack implementation and discussion will be something I will write more about in the future. Garrett's Making, Breaking Codes is an excellent text I would recommend for anyone interested in deeper treatment of these topics.

Perl Weekly Challenge 014

This week's challenge problems allowed for the nice use of some classic Perl idioms, especially so if you intentionally avoid the use of any CPAN modules which is what I usually do.

As with the previous weekly challenges the problem statements are short and are included in the first comment of the code. The code blocks  shown  link to GitHub Gists.

Part 1

ch-1.pl
ch-1.pl
Collapse )