GitHub - alexanderbailey/FISOFS: Mathematical software for computing cofinite subsemigroups of free semigroups

Branches Tags
Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
subsgps/n-2		subsgps/n-2
COPYING		COPYING
README		README
get_poly.cpp		get_poly.cpp
join.cpp		join.cpp
next_index.cpp		next_index.cpp
parallel_join.cpp		parallel_join.cpp
parallel_join_config.cpp		parallel_join_config.cpp
read_file.cpp		read_file.cpp
Repository files navigation

-----------------------------------------------------------------------------
-                                                                           -
-   Copyright (C) 2014 Alex Bailey                                          -
-                                                                           -
-   'FISOFS' is free software: you can redistribute it and/or modify it     -
-   under the terms of the GNU General Public License as published by       -
-   the Free Software Foundation, either version 3 of the license, or       -
-   (at your option) any later version.                                     -
-                                                                           -
-   'FISOFS' is distributed in the hope that it will be useful, but         -
-   WITHOUT ANY WARRANTY; without even the implied warranty of              -
-   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the           -
-   GNU General Public License for more details.                            -
-                                                                           -
-   A copy of the GNU General Public License is available in the file       -
-   'COPYING'; or for later versions see <http://www.gnu.org/licenses/>.    -
-                                                                           -
-   You can contact the author through the website:                         -
-   alexanderbailey.github.io                                               -
-                                                                           -
-----------------------------------------------------------------------------

This is the README file for 'FISOFS'. The project website can be found at:

alexanderbailey.github.io/software

'FISOFS' is mathematical software for computing Finite (Rees) Index Subsemigroups Of Free Semigroups. Background material, pseduo-code and justification of the algorithms can be found in the paper [arxiv.org/abs/1409.2444].

--------
Contents
--------

After extraction, the folder FISOFS should contain the directories

  subsgps
  subsgps/n-2

the source code files

  next_index.cpp
  join.cpp
  parallel_join_config.cpp
  parallel_join.cpp
  get_poly.cpp
  read_file.cpp

the data files

  subsgps/n-2/n-2-k-1-i-1.bit8
  subsgps/n-2/n-2-k-2-i-1.bit8
  subsgps/n-2/n-2-k-2-i-2.bit8

the license file COPYING and this README file.

++++++++++
next_index
++++++++++

This program takes four command line parameters: n = index, k = support, p = number of parallel jobs, q = which job to run now.

It reads in the relevant files containing the subsemigroups of index n-1, support k-1 and support k and calculates the subsemigroups of index n and support k, saving them in distinct output files for each of their orbit sizes.

It saves the files appended with "-q" where q is the job number, unless p=q=1 in which case the file names are not appended.

++++
join
++++

This takes three command line parameters: n = index, k = support, i = orbit size.

This reads in all files from parallel jobs with index n, support k and orbit size i, takes their union and saves it as one output file without an appended file name.

If the character "A" is used instead of the orbit size, it will loop through all the orbit sizes and join each of the files.

++++++++++++++++++++
parallel_join_config
++++++++++++++++++++

This takes four command line parameters: n = index, k = support, i = orbit size, p = number of parallel_jobs.

Sometimes the joining of files requires too much memory if the filesizes are large. Since the semigroups in the files are saved in order, we can split the files up in such a way that enables us to union parts of the files and then concatenate our results together later. This program analyses the files and creates a configuration file detailing how to split up the files for each parallel job.

+++++++++++++
parallel_join
+++++++++++++

This takes four command line parameters: n = index, k = support, i = orbit size, q = which job to run now

This joins the files based on a configuration file created by parallel_join_config.

++++++++
get_poly
++++++++

This takes one command line parameter: n = index.

Once all the files for index n, support 1 <= k <= n have been computed, this calculates the polynomial a_n(FS_r).

+++++++++
read_file
+++++++++

This takes five command line parameters: n = index, k = support, i = orbit size, p = number of chunks, q = which chunk to read now

This splits a file in to p chunks and outputs the set of gaps of the semigroups in the q-th chunk. The output is in two formats: as the words in the free semigroup and as the shortlex encoded positions of these words.

---------
Compiling
---------

The files can be compiled using any standard C++ compiler. For example:

g++ next_index.cpp -o next_index -O3

It is recommended to use the O3 flag for optimisation.

----------------------
Example for next_index
----------------------

The algorithm is recursive. The files for n=2 are already given in the folder 'subsgps/n-2', from which we can create all files needed for n=3, which are needed for calculating the files for n=4 etc. If you don't have the necessary files with the right file names in the correct folders, it will produce undefined behaviour!

If we start by running the commands

./next_index 3 1 1 1
./next_index 3 2 1 1
./next_index 3 3 1 1

This will calculate the subsemigroups of index 3 and supports 1 through 3. Note, the subsemigroups of support 1 are precisely the numerical semigroups of genus 3 (for a more efficient algorithm for calculating numerical semigroups see the NumericalSgps package for GAP).

As the index gets higher, it may be of advantage to split the calculations in to parallel jobs.
For example, assuming that you have already computed the files for index 5 and support 3 and 4, we can compute in parallel the files for index 6 and support 4 with the following commands:

./next_index 6 4 10 q

where q ranges from 1 to 10. This will produce output files appended by numbers from 1 to 10.

----------------
Example for join
----------------

If we have run parallel jobs for next_index we will have multiple files for each orbit size appended by the job number. We need to take the union of these files. Extending the example above, we see that the output files have orbit sizes 4, 6, 8, 12 and 24, so we would run the commands:

./join 6 4 4
./join 6 4 6
./join 6 4 8
./join 6 4 12
./join 6 4 24

or alternatively we can run one command:

./join 6 4 A

which will join the files for each of the orbit sizes.
We can then delete the smaller files, (e.g. rm subsgps/n-6/n-6-k-4-*bit*-*)

----------------------------------
Example for parallel join(_config)
----------------------------------

The join program just reads in all relevant files and takes their union. This can easily require more memory than available for large files. In this case we split the union task up in to smaller jobs. Say for example we want to join the 10 following files:

subsgps/n-6/n-6-k-4-i-24.bit24-01
subsgps/n-6/n-6-k-4-i-24.bit24-02
subsgps/n-6/n-6-k-4-i-24.bit24-03
subsgps/n-6/n-6-k-4-i-24.bit24-04
subsgps/n-6/n-6-k-4-i-24.bit24-05
subsgps/n-6/n-6-k-4-i-24.bit24-06
subsgps/n-6/n-6-k-4-i-24.bit24-07
subsgps/n-6/n-6-k-4-i-24.bit24-08
subsgps/n-6/n-6-k-4-i-24.bit24-09
subsgps/n-6/n-6-k-4-i-24.bit24-10

and split the union task in to 5 smaller jobs. We would run the command

./parallel_join_config 6 4 24 5

This creates a configuration file called "join-config-n-6-k-4-i-24.bit24" and gives an output of the memory required for each of the five jobs and the maximum memory required out of all the jobs. (DISCLAIMER: this is not the actual memory required for the job but the size of the total data that will be read in from the input files. The actual required memory will be more than this, but it is a good benchmark.) If the maximum memory required seems too high, we can increase the number of jobs. Note that unlike 'join', we cannot replace 24 by A, we must do each orbit size individually. This configuration file is then used for the next set of commands:

./parallel_join 6 4 24 q

where q ranges from 1 to 5. We now have the following files:

subsgps/n-6/joined-n-6-k-4-i-24.bit24-1
subsgps/n-6/joined-n-6-k-4-i-24.bit24-2
subsgps/n-6/joined-n-6-k-4-i-24.bit24-3
subsgps/n-6/joined-n-6-k-4-i-24.bit24-4
subsgps/n-6/joined-n-6-k-4-i-24.bit24-5

We concatenate these files to get our final file:

cat subsgps/n-6/joined-n-6-k-4-i-24.bit24-* > subsgps/n-6/n-6-k-4-i-24.bit24

and then we can delete our configuration file and all other files, e.g:

rm join-config-n-6-k-4-i-24.bit24
rm subsgps/n-6/joined-n-6-k-4-i-24.bit24-*
rm subsgps/n-6/n-6-k-4-i-24.bit24-*

--------------------
Example for get_poly
--------------------

Once we have calculated and joined all files for a particular index we can then calculate the polynomial a_n(FS_r). For example, for index 6 we run the command:

./get_poly 6

This reads the filesizes of the files in the folder 'subsgps/n-6' and outputs the polynomial.

---------------------
Example for read_file
---------------------

We may want to display the semigroups in a particular file, but we may not want to display all of them (it may be a big file!) For example if we want to read the first few semigroups in the file 'subsgps/n-6/n-6-k-4-i-24.bit24'. The command:

./read_file 6 4 24 5000 1

will give the output:

File: subsgps/n-6/n-6-k-4-i-24.bit24, filesize = 690030, number of semigroups = 46002
Reading chunk 1 of 5000
Reading semigroups 0 to 9, from byte 0 to 135
0 1 2 3 4 10 [ 1 ][ 2 ][ 3 ][ 4 ][ 1 1 ][ 2 3 ]
0 1 2 3 4 26 [ 1 ][ 2 ][ 3 ][ 4 ][ 1 1 ][ 1 2 3 ]
0 1 2 3 4 38 [ 1 ][ 2 ][ 3 ][ 4 ][ 1 1 ][ 2 1 3 ]
0 1 2 3 4 42 [ 1 ][ 2 ][ 3 ][ 4 ][ 1 1 ][ 2 2 3 ]
0 1 2 3 4 44 [ 1 ][ 2 ][ 3 ][ 4 ][ 1 1 ][ 2 3 1 ]
0 1 2 3 4 45 [ 1 ][ 2 ][ 3 ][ 4 ][ 1 1 ][ 2 3 2 ]
0 1 2 3 4 46 [ 1 ][ 2 ][ 3 ][ 4 ][ 1 1 ][ 2 3 3 ]
0 1 2 3 4 47 [ 1 ][ 2 ][ 3 ][ 4 ][ 1 1 ][ 2 3 4 ]
0 1 2 3 4 90 [ 1 ][ 2 ][ 3 ][ 4 ][ 1 1 ][ 1 1 2 3 ]


Here [ 1 ][ 2 ][ 3 ][ 4 ][ 1 1 ][ 2 3 4 ] represents the semigroup with set of gaps {a,b,c,d,a^2,bcd} and 0 1 2 3 4 47 are the respective shortlex positions of each of these words (this is how it is saved in the file).

----------------
General comments
----------------

There are basically no checks for valid input, so if you give any of the programs the wrong number of command line parameters, or nonsensical parameters it will almost certainly do something undesired or produce an error.

The programs are very much dependent on the locations and names of the data files. Don't move them or put strange files in the 'subsgps' folders!

Enjoy! Please feel free to email me any bugs, questions or comments etc.