# Parikh Matrices

Parikh Matrices are a newly developed tool for studying numerical properties of words in terms of their (scattered) subwords. They were introduced by Mateescu et al. in 2000 and continuously received attention from the research community ever since.

Below is a list of publication on this matter. You can also try the online Parikh Matrix calculator developed by Virgil Nicolae Șerbănuță.

Subword balance, position indices and power sums
Arto Salomaa
J. of CSS, Volume 76(8), pp 861-871. 2010
Abstract. In this paper, we investigate various ways of characterizing words, mainly over a binary alphabet, using information about the positions of occurrences of letters in words. We introduce two new measures associated with words, the position index and sum of position indices. We establish some characterizations, connections with Parikh matrices, and connections with power sums. One particular emphasis concerns the effect of morphisms and iterated morphisms on words.
J.CSS, BIB
On Parikh Matrices and Ambiguity
Virgil Serbanuta
PhD Thesis, University of Bucharest, April 2010
Abstract. The theory of subwords is a part of combinatorics of words dealing with properties of words related to classical non-commutative algebra. The main problem of this theory can be described as follows: “Can a word be computed knowing a part of the set of its subwords (with or without multiplicities)?”. This thesis investigates various ways to uniquely characterize words by the number of occurrences of special types of subwords. First, ways to reduce the set of subwords that need to be used are investigated. After this, the print of a word (obtained by replacing all sequences of identical letters by only one letter) is used to gain a deeper understanding of words that can be uniquely identiﬁed by the subwords considered and of the class of languages their language could belong. In the end these results are extended to larger sets of subwords, and the number of occurrences for any subword is identiﬁed to be a polynomial.
PDF, BIB
On Parikh Matrices, Ambiguity, and Prints
Virgil Nicolae Șerbănuță
IJFCS 20(1) pp 151-165. 2009
Abstract. In the algebraic study of words and languages it is often convenient to analyze words as numerical quantities. One such now-famous example is the study of words by using their attached Parikh mapping (or vector). However, the Parikh vector abstracts away too much of the structure of the word. Parikh matrices were introduced as a means to obtain more than just the number of occurrences of single letters. When studying properties of words, an important property is that a word is uniquely determined by the number of occurrences of certain predetermined subwords. In the context of Parikh matrices, the problem is translated in finding which words are completely characterized by their associated Parikh matrix. This paper links this problem with that of the print of a word, i.e. the word obtained by considering consecutive occurrences of the same letter as only one letter. We obtain results regarding finiteness and context-freeness of such classes of words.
PDF, DOI
Charasteristic Words for Parikh Matrices
Arto Salomaa
Automata, Formal Languages, and Related Topics, pp 117-127. 2009
Abstract. Characterizing words by numerical quantities eliminates the undesirable noncommutativity. The basic numerical quantity investigated in this paper is $\sca{u}$, the number of occurrences of a word $u$ as a scattered subword of a word $w$. Recently introduced Parikh matrices constitute a powerful tool in this investigation. To remove the ambiguity present in the matrices, we associate in this paper a unique word, the Lyndon image, to each matrix. Lyndon images are studied both in connection with matrices and as a language-theoretic notion. Algorithms and characterization results are obtained. The setup in the case of a binary alphabet is studied in detail.
TR@TUCS, BIB
Subword Occurrences, Parikh Matrices and Lyndon Images
Arto Salomaa and Sheng Yu
IJFCS, Volume 21(1), pp 91-111. (2010)
Abstract. We investigate the number of occurrences of a word $u$ as a (scattered) subword of a word $w$. The notion of a Parikh matrix, recently introduced, is a basic tool in this investigation. In general, several words are associated with a Parikh matrix. The ambiguity can be resolved by associating a unique word called the Lyndon image to each Parikh matrix. In this paper we will investigate properties of Lyndon images and the corresponding questions of ambiguity. We give an exhaustive characterization in the case of a binary alphabet. Our main results in the general case deal with the comparison of unambiguous words and Lyndon images, algorithms for constructing Lyndon images, as well as classes of words with the same Parikh matrix, obtained by circular variance.
TR@TUCS, IJFCS, BIB
Some Alternatives to Parikh Matrices Using String Kernels
Alexander Clark and Chris Watkins
Fundamenta Informaticae 84(3-4): 291-303 (2008)
Abstract. We describe methods of representing strings as real valued vectors or matrices; we show how to integrate two separate lines of enquiry: string kernels, developed in machine learning, and Parikh matrices [8], which have been studied intensively over the last few years as a powerful tool in the study of combinatorics over words. In the field of machine learning, there is widespread use of string kernels, which use analogous mappings into high dimensional feature spaces based on the occurrences of subwords or factors. In this paper we show how one can use string kernels to construct two alternatives to Parikh matrices, that overcome some of the limitations of the Parikh matrix construction. These are morphisms from the free monoid to rings of real-valued matrices under multiplication: one is based on the subsequence kernel and the other on the gap-weighted string kernel. For the latter kernel we demonstrate that for many values of the gap-weight hyperparameter the resulting morphism is injective.
Electronic Edition, BIB
On Subword Symmetry of Words
Anton Černý
IJFCS 19(1): 243-250 (2008)
Abstract. We call a word L-symmetric with respect to a finite language L if it contains the same number of scattered subwords u as of $u^R$ for every word u from L. We show that increasing the size of the language L may lead to an unlimited refinement of the language of L-symmetric words. Further we prove that if a long enough initial segment of a D0L-sequence consists entirely of L-symmetric words, then all words in the sequence are L-symmetric.
DOI, BIB
On inequalities between subword histories
Szilárd Zsolt Fazekas
IJFCS 19(4): 1039-1047 (2008)
Abstract. In this paper we study subword inequalities, that is, inequalities between the number of occurrences of certain scattered subwords in a word. The problem of deciding whether a subword inequality holds for all words was proposed in [5]. We provide partial results describing a few cases in which the subword inequalities hold. Whether a subword inequality falls in these categories is decidable. However, the general question remains open.
DOI, BIB
Subword histories and associated matrices
Arto Salomaa
J. of TCS 407(1-3): 250-257 (2008)
Abstract. The basic numerical quantity investigated in this paper is $|w|_u$, the number of occurrences of a word u as a scattered subword of the word w. Arithmetical combinations of such quantities yield a so-called subword history. We investigate the information content of subword histories. Reducing subword histories to linear ones, as well as the recently introduced Parikh matrices, will be important tools. Simple polynomial formulas for computing the value of a subword history for arbitrary powers of a word are obtained.
DOI, BIB
Parikh matrices and amiable words
J. of TCS 390(1): 102-109 (2008)
Abstract. Using the fact that the Parikh matrix mapping is not an injective mapping, the paper investigates some properties of the set of words with the same Parikh matrix; these words are called “amiable”. The presented results extend the results obtained in [A. Atanasiu, Binary amiable words, Int. J. Found. Comput. Sci. 18 (2) (2007) 387–400] for the binary case. In particular it is shown that all the words having the same Parikh matrix can be obtained one from another by applying only two types of transformations. Moreover, the mirrors of two amiable words are also amiable (thus forming a symmetrical class of words).
DOI, BIB
On fairness of D0L systems
Anton Černý
Discrete Applied Mathematics 155(3), pp 1769-1773. 2007
Abstract. A word is called fair if it contains, for each pair of distinct symbols a,b, the same number of occurrences of the scattered subword ab as of ba. We prove that if the first k+1 words in the sequence generated by a D0L system over a k-letter alphabet are fair then all words in the sequence are fair.
DOI, BIB
Comparing Subword Occurrences in Binary D0L Sequences
Arto Salomaa
IJFCS 18(6): 1395-1406. 2007
Abstract. We investigate binary words and ω-words, mainly those generated by D0L systems. We compare the numbers of occurrences of ab and ba as (scattered) subwords by introducing a 'difference function'. Of special interest are the 'balanced' or 'fair' words for which the function assumes the value 0. If the three first words in a D0L sequence are fair, so are all words in the sequence. Two kinds of matrices constitute a useful technical tool in the study. An explicit formula, leading to various consequences, is obtained for the difference function.
DOI, BIB
Subword Balance in Binary Words, Languages and Sequences
Arto Salomaa
Fundamenta Informaticae, 75(1-4):469-482 (2007)
Abstract. The paper investigates inference based on quantities $|w|_u$, the number of occurrences of a word u as a scattered subword of the word w. Parikh matrices recently introduced are useful tools for such investigations. We introduce and study universal languages for Parikh matrices. We also obtain results concerning the inference from numbers $|w|_u$ to w, as well as from certain entries of a Parikh matrix to other entries.
IOSPRESS, BIB
Binary Amiable Words
IJFCS 18(2): 387-400 (2007)
Abstract. Using the fact that the Parikh matrix mapping is not an injective mapping, the paper investigates some properties of the set of the binary words having the same Parikh matrix; these words are called "amiable" Some results concerning the conditions when the equivalence classes of amiable words have more than one element, a characterization theorem concerning a graph associated to an equivalence class of amiable words, and some basic properties of a rank distance defined on these classes are the main subjects considered here.
DOI, BIB
Subword conditions and subword histories
Arto Salomaa and Sheng Yu
Information and Computation 204(12): 1741-1755, 2006
Abstract. This paper introduces the notion of a subword condition and investigates languages defined by them. The special case, where the language reduces to one word, concerns the inference of a sequence from its subsequences. We obtain various characterization and decidability results for languages defined by subword conditions. The results contribute to the theory of Parikh matrices and arithmetizing the study of words. An important notion from early automata theory, that of a quasi-uniform event, plays a central role in our characterization.
DOI, BIB
Independence of certain quantities indicating subword occurrences
Arto Salomaa
J. of TCS 362(1-3): 222-231 (2006)
Abstract. When words are characterized in terms of numerical quantities, awkward considerations due to the noncommutativity of words are avoided. The numerical quantity investigated in this paper is $|w|_u$, the number of occurrences of a word u as a (scattered) subword of a word w. Parikh matrices recently introduced have these quantities as their entries. According to the main result in this paper, no entry in a Parikh matrix, no matter how high the dimension, can be computed in terms of the other entries. Consequences concerning various inference problems between numbers $|w|_u$ themselves, as well as of the word w from these numbers, are obtained.
DOI, BIB
On Some Problems of Mateescu Concerning Subword Occurrences
Cunsheng Ding and Arto Salomaa
Fundamenta Informaticae 73(1-2): 65-79 (2006)
Abstract. The paper investigates inference based on quantities $\mid w \mid_u$, the number of occurrences of a word u as a scattered subword of w. Parikh matrices recently introduced are useful tools for such investigations. We introduce and study universal languages for Parikh matrices. We also obtain results concerning the inference from numbers $\mid w \mid_u$ to w, as well as from certain entries of a Parikh matrix to other entries.
IOSPRESS, BIB
Injectivity of the Parikh matrix mappings revisited
Virgil Nicolae Șerbănuță and Traian Florin Serbanuta
Fundamenta Informaticae, Volume 73(1-2), pp. 265-283. 2006
Abstract. We deal with the notion of M-unambiguity [5] in connection with the Parikh matrix mapping introduced by Mateescu and others in [7]. M-unambiguity is studied both in terms of words and matrices and several sufficient criteria for M-unambiguity are provided in both cases, nontrivially generalizing the criteria based on the $\gamma$-property introduced by Salomaa in [15]. Also, the notion of M-unambiguity with respect to a word is defined in connection with the extended Parikh matrix morphism [16] and some of the M-unambiguity criteria are lifted from the classical setting to the extended one.
This paper is an revised and extended version of [17].
PDF, IOSPRESS, BIB
Connections between subwords and certain matrix mappings
Arto Salomaa
J. of TCS 340(1): 188-203 (2005)
Abstract. Parikh matrices recently introduced have turned out to be a powerful tool in the arithmetizing of the theory of words. In particular, many inequalities between (scattered) subword occurrences have been obtained as consequences of the properties of the matrices. This paper continues the investigation of Parikh matrices and subword occurrences. In particular, we study certain inequalities, as well as information about subword occurrences sufficient to determine the whole word uniquely. Some algebraic considerations, facts about forbidden subwords, as well as some open problems are also included.
DOI, BIB
On the Injectivity of Parikh Matrix Mappings
Arto Salomaa
Fundamenta Informaticae 64(1-4): 391-404 (2005)
Abstract. We investigate the number of (scattered) subword occurrences and Parikh matrices, especially the case where the matrix determines the word uniquely. A condition introduced in this paper, called γ-property, turns out to be a powerful tool for such unambiguous matrices. Interconnections with the general theory of subword histories are also pointed out.
IOSPRESS, BIB
On Some Problems of Mateescu Concerning Subword Occurrences
Cunsheng Ding and Arto Salomaa
Technical Report TUCS-TR-701, August 2005
Abstract. The paper investigates inference based on quantities $\sca{u}$, the number of occurrences of a word $u$ as a scattered subword of $w$. Parikh matrices recently introduced are useful tools for such investigations. We introduce and study universal languages for Parikh matrices. We also obtain results concerning the inference from numbers $\sca{u}$ to $w$, as well as from certain entries of a Parikh matrix to other entries.
Tech. Report's page, BIB
On Languages Defined by Numerical Parameters
Arto Salomaa
Technical Report TUCS-TR-663. 2005
Abstract. The paper investigates definitions of languages based on Boolean combinations of equations dealing with the number of subword occurrences. Complete characterizations are obtained for a certain class of subwords (a-separated). Interconnections to the theory of Parikh-matrices, as well as applications of some known results, are also studied.
Tech. Report's page, BIB
Some characterizations of Parikh matrix equivalent binary words
S. Fosse and G. Richomme
Information Processing Letters 92(2):77-82 (2004)
Abstract. We state different characterizations of pair of words having the same Parikh matrix. Keywords: Formal languages; Combinatorics on words; Subwords; Parikh matrix; Palindromes
DOI, BIB
Algebraic Aspects of Parikh Matrices
Alexandru Mateescu
Theory Is Forever, LNCS 3113: 170-180 (2004)
Abstract. This paper contains algebraic aspects of Parikh matrices. We present new, but also some old results, concerning this topic. It is proved that in some cases the set of Parikh matrices is a noncommutative semiring with a unit element. Also we prove that the set of Parikh matrices is closed under the operation of shuffle on trajectories and thus it is closed under many other operations. It is presented also the notion of extended Parikh matrix that it is an extension of the notion of the Parikh matrix. The paper contains also a number of open problems.
LNCS, BIB
Matrix Indicators For Subword Occurrences And Ambiguity
Alexandru Mateescu and Arto Salomaa
IJFCS 15(2): 277-292 (2004)
Abstract. The paper investigates inequalities between the numbers of different (scattered) subword occurrences. The Parikh matrix recently introduced is an efficient tool. We give various characterizations for Parikh matrices. Of special interest is the case where the matrix determines the word uniquely. We investigate such matrix unambiguous words. The considerations are extended to concern languages. Several open problems and problem areas are indicated.
IJFCS, Technical Report, BIB
Subword histories and Parikh matrices
Alexandru Mateescu and Arto Salomaa and Sheng Yu
JCSS 68(1): 1-21 (2004)
Abstract. Parikh matrices recently introduced give much more information about a word than just the number of occurrences of each letter. In this paper we introduce the closely related notion of a subword history and obtain a sequence of general results: elimination of products, decidability of equivalence, and normal form. We also investigate overall methods for proving the validity of such results. A general inequality of "Cauchy type" for subword occurrences is established.
JCSS, BIB
Extending Parikh Matrices
Traian Florin Serbanuta
J. of TCS, Volume 310(1), pp 233-246. 2004
Abstract. We introduce the notion of Parikh matrix induced by a word, a natural extension to the notion of Parikh matrix and prove a set of properties for this kind of matrices.We also study the relation between these two notions. We show that combining properties from both we obtain a more powerful tool for proving algebraic properties of words.
PDF, J.TCS, DBLP, BIB
Counting (scattered) Subwords
Arto Salomaa
B. of EATCS 81: 165-179 (2003)
Abstract. An important numerical parameter associated to a word w is the number of occurrences of another word u as a (scattered) subword of w. The knowledge of sufficiently many such numbers characterizes the original word completely. The paper investigates methods of counting such numbers, both combinatorial and those based on Parikh matrices. The main results concern certain characteristic inequalities.
BIB
On the Injectivity of the Parikh Matrix Mapping
Adrian Atanasiu, Carlos Martin-Vide and Alexandru Mateescu
Fundamenta Informaticae 49(4): 289-299 (2002)
Abstract. In this paper we investigate the injectivity of the Parikh matrix mapping. This research is done mainly on the binary alphabet. We identify a family of binary words, refered to as "palindromic amiable", such that two such words are palindromic amiable if and only if they have the same image by the Parikh matrix mapping. Some other related problems are discussed, too.
BIB
Codifiable languages and the Parikh matrix mapping
Adrian Atanasiu, Carlos Martin-Vide and Alexandru Mateescu
JUCS 7(9): 783–793 (2001)
Abstract.
JUCS, BIB
A sharpening of the Parikh mapping
Alexandru Mateescu, Arto Salomaa, Kai Salomaa and Sheng Yu
ITA 35(6): 551-564 (2001)
Abstract. In this paper we introduce a sharpening of the Parikh mapping and investigate its basic properties. The new mapping is based on square matrices of a certain form. The classical Parikh vector appears in such a matrix as the second diagonal. However, the matrix product gives more information about a word than the Parikh vector. We characterize the matrix products and establish also an interesting interconnection between mirror images of words and inverses of.
ITA, BIB
On an Extension of the Parikh Mapping
Alexandru Mateescu, Arto Salomaa, Kai Salomaa and Sheng Yu
Technical Report TUCS-TR-364, September 2000
Abstract. In this paper we introduce an extension of the Parikh mapping and investigate some of its basic properties. The extension is based on square matrices of a certain form. The classical Parikh vector appears in such a matrix as the second diagonal. However, the matrix product gives more information about a word than the Parikh vector. We characterize the matrix products and establish also an interesting interconnection between mirror images of words and inverses of matrices.
Tech. Report's page, citeseer, BIB