# Parikh Matrices

From FSL

Parikh Matrices are a newly developed tool for studying numerical properties of words in terms of their (scattered) subwords. They were introduced by Mateescu et al. in 2000 and continuously received attention from the research community ever since.

Below is a list of publication on this matter. You can also try the online Parikh Matrix calculator developed by Virgil Nicolae Șerbănuță.

**Subword balance, position indices and power sums**- Arto Salomaa
, Volume 76(8), pp 861-871. 2010**J. of CSS**

*Abstract.*In this paper, we investigate various ways of characterizing words, mainly over a binary alphabet, using information about the positions of occurrences of letters in words. We introduce two new measures associated with words, the position index and sum of position indices. We establish some characterizations, connections with Parikh matrices, and connections with power sums. One particular emphasis concerns the effect of morphisms and iterated morphisms on words.

- J.CSS, BIB
**On Parikh Matrices and Ambiguity**- Virgil Serbanuta
, University of Bucharest, April 2010**PhD Thesis**

*Abstract.*The theory of subwords is a part of combinatorics of words dealing with properties of words related to classical non-commutative algebra. The main problem of this theory can be described as follows: “Can a word be computed knowing a part of the set of its subwords (with or without multiplicities)?”. This thesis investigates various ways to uniquely characterize words by the number of occurrences of special types of subwords. First, ways to reduce the set of subwords that need to be used are investigated. After this, the print of a word (obtained by replacing all sequences of identical letters by only one letter) is used to gain a deeper understanding of words that can be uniquely identiﬁed by the subwords considered and of the class of languages their language could belong. In the end these results are extended to larger sets of subwords, and the number of occurrences for any subword is identiﬁed to be a polynomial.

- PDF, BIB
**On Parikh Matrices, Ambiguity, and Prints**- Virgil Nicolae Șerbănuță
20(1) pp 151-165. 2009**IJFCS**

*Abstract.*In the algebraic study of words and languages it is often convenient to analyze words as numerical quantities. One such now-famous example is the study of words by using their attached Parikh mapping (or vector). However, the Parikh vector abstracts away too much of the structure of the word. Parikh matrices were introduced as a means to obtain more than just the number of occurrences of single letters. When studying properties of words, an important property is that a word is uniquely determined by the number of occurrences of certain predetermined subwords. In the context of Parikh matrices, the problem is translated in finding which words are completely characterized by their associated Parikh matrix. This paper links this problem with that of the print of a word, i.e. the word obtained by considering consecutive occurrences of the same letter as only one letter. We obtain results regarding finiteness and context-freeness of such classes of words.

- PDF, DOI
**Charasteristic Words for Parikh Matrices**- Arto Salomaa
, pp 117-127. 2009**Automata, Formal Languages, and Related Topics**

*Abstract.*Characterizing words by numerical quantities eliminates the undesirable noncommutativity. The basic numerical quantity investigated in this paper is $\sca{u}$, the number of occurrences of a word $u$ as a scattered subword of a word $w$. Recently introduced Parikh matrices constitute a powerful tool in this investigation. To remove the ambiguity present in the matrices, we associate in this paper a unique word, the Lyndon image, to each matrix. Lyndon images are studied both in connection with matrices and as a language-theoretic notion. Algorithms and characterization results are obtained. The setup in the case of a binary alphabet is studied in detail.

- TR@TUCS, BIB
**Subword Occurrences, Parikh Matrices and Lyndon Images**- Arto Salomaa and Sheng Yu
, Volume 21(1), pp 91-111. (2010)**IJFCS**

*Abstract.*We investigate the number of occurrences of a word as a (scattered) subword of a word . The notion of a Parikh matrix, recently introduced, is a basic tool in this investigation. In general, several words are associated with a Parikh matrix. The ambiguity can be resolved by associating a unique word called the Lyndon image to each Parikh matrix. In this paper we will investigate properties of Lyndon images and the corresponding questions of ambiguity. We give an exhaustive characterization in the case of a binary alphabet. Our main results in the general case deal with the comparison of unambiguous words and Lyndon images, algorithms for constructing Lyndon images, as well as classes of words with the same Parikh matrix, obtained by circular variance.

- TR@TUCS, IJFCS, BIB
**Some Alternatives to Parikh Matrices Using String Kernels**- Alexander Clark and Chris Watkins
84(3-4): 291-303 (2008)**Fundamenta Informaticae**

*Abstract.*We describe methods of representing strings as real valued vectors or matrices; we show how to integrate two separate lines of enquiry: string kernels, developed in machine learning, and Parikh matrices [8], which have been studied intensively over the last few years as a powerful tool in the study of combinatorics over words. In the field of machine learning, there is widespread use of string kernels, which use analogous mappings into high dimensional feature spaces based on the occurrences of subwords or factors. In this paper we show how one can use string kernels to construct two alternatives to Parikh matrices, that overcome some of the limitations of the Parikh matrix construction. These are morphisms from the free monoid to rings of real-valued matrices under multiplication: one is based on the subsequence kernel and the other on the gap-weighted string kernel. For the latter kernel we demonstrate that for many values of the gap-weight hyperparameter the resulting morphism is injective.

- Electronic Edition, BIB
**On Subword Symmetry of Words**- Anton Černý
19(1): 243-250 (2008)**IJFCS**

*Abstract.*We call a word L-symmetric with respect to a finite language L if it contains the same number of scattered subwords u as of for every word u from L. We show that increasing the size of the language L may lead to an unlimited refinement of the language of L-symmetric words. Further we prove that if a long enough initial segment of a D0L-sequence consists entirely of L-symmetric words, then all words in the sequence are L-symmetric.

- DOI, BIB
**On inequalities between subword histories**- Szilárd Zsolt Fazekas
19(4): 1039-1047 (2008)**IJFCS**

*Abstract.*In this paper we study subword inequalities, that is, inequalities between the number of occurrences of certain scattered subwords in a word. The problem of deciding whether a subword inequality holds for all words was proposed in [5]. We provide partial results describing a few cases in which the subword inequalities hold. Whether a subword inequality falls in these categories is decidable. However, the general question remains open.

- DOI, BIB
**Subword histories and associated matrices**- Arto Salomaa
407(1-3): 250-257 (2008)**J. of TCS**

*Abstract.*The basic numerical quantity investigated in this paper is , the number of occurrences of a word*u*as a scattered subword of the word*w*. Arithmetical combinations of such quantities yield a so-called subword history. We investigate the information content of subword histories. Reducing subword histories to linear ones, as well as the recently introduced Parikh matrices, will be important tools. Simple polynomial formulas for computing the value of a subword history for arbitrary powers of a word are obtained.

- DOI, BIB
**Parikh matrices and amiable words**- Adrian Atanasiu, Radu Atanasiu and Ion Petre
390(1): 102-109 (2008)**J. of TCS**

*Abstract.*Using the fact that the Parikh matrix mapping is not an injective mapping, the paper investigates some properties of the set of words with the same Parikh matrix; these words are called “amiable”. The presented results extend the results obtained in [A. Atanasiu, Binary amiable words, Int. J. Found. Comput. Sci. 18 (2) (2007) 387–400] for the binary case. In particular it is shown that all the words having the same Parikh matrix can be obtained one from another by applying only two types of transformations. Moreover, the mirrors of two amiable words are also amiable (thus forming a symmetrical class of words).

- DOI, BIB
**On fairness of D0L systems**- Anton Černý
155(3), pp 1769-1773. 2007**Discrete Applied Mathematics**

*Abstract.*A word is called fair if it contains, for each pair of distinct symbols a,b, the same number of occurrences of the scattered subword ab as of ba. We prove that if the first k+1 words in the sequence generated by a D0L system over a k-letter alphabet are fair then all words in the sequence are fair.

- DOI, BIB
**Comparing Subword Occurrences in Binary D0L Sequences**- Arto Salomaa
18(6): 1395-1406. 2007**IJFCS**

*Abstract.*We investigate binary words and ω-words, mainly those generated by D0L systems. We compare the numbers of occurrences of ab and ba as (scattered) subwords by introducing a 'difference function'. Of special interest are the 'balanced' or 'fair' words for which the function assumes the value 0. If the three first words in a D0L sequence are fair, so are all words in the sequence. Two kinds of matrices constitute a useful technical tool in the study. An explicit formula, leading to various consequences, is obtained for the difference function.

- DOI, BIB
**Subword Balance in Binary Words, Languages and Sequences**- Arto Salomaa
, 75(1-4):469-482 (2007)**Fundamenta Informaticae**

*Abstract.*The paper investigates inference based on quantities , the number of occurrences of a word*u*as a scattered subword of the word*w*. Parikh matrices recently introduced are useful tools for such investigations. We introduce and study universal languages for Parikh matrices. We also obtain results concerning the inference from numbers to*w*, as well as from certain entries of a Parikh matrix to other entries.

- IOSPRESS, BIB
**Binary Amiable Words**- Adrian Atanasiu
18(2): 387-400 (2007)**IJFCS**

*Abstract.*Using the fact that the Parikh matrix mapping is not an injective mapping, the paper investigates some properties of the set of the binary words having the same Parikh matrix; these words are called "amiable" Some results concerning the conditions when the equivalence classes of amiable words have more than one element, a characterization theorem concerning a graph associated to an equivalence class of amiable words, and some basic properties of a rank distance defined on these classes are the main subjects considered here.

- DOI, BIB
**Subword conditions and subword histories**- Arto Salomaa and Sheng Yu
204(12): 1741-1755, 2006**Information and Computation**

*Abstract.*This paper introduces the notion of a subword condition and investigates languages defined by them. The special case, where the language reduces to one word, concerns the inference of a sequence from its subsequences. We obtain various characterization and decidability results for languages defined by subword conditions. The results contribute to the theory of Parikh matrices and arithmetizing the study of words. An important notion from early automata theory, that of a quasi-uniform event, plays a central role in our characterization.

- DOI, BIB
**Independence of certain quantities indicating subword occurrences**- Arto Salomaa
362(1-3): 222-231 (2006)**J. of TCS**

*Abstract.*When words are characterized in terms of numerical quantities, awkward considerations due to the noncommutativity of words are avoided. The numerical quantity investigated in this paper is , the number of occurrences of a word u as a (scattered) subword of a word w. Parikh matrices recently introduced have these quantities as their entries. According to the main result in this paper, no entry in a Parikh matrix, no matter how high the dimension, can be computed in terms of the other entries. Consequences concerning various inference problems between numbers themselves, as well as of the word w from these numbers, are obtained.

- DOI, BIB
**On Some Problems of Mateescu Concerning Subword Occurrences**- Cunsheng Ding and Arto Salomaa
73(1-2): 65-79 (2006)**Fundamenta Informaticae**

*Abstract.*The paper investigates inference based on quantities , the number of occurrences of a word u as a scattered subword of w. Parikh matrices recently introduced are useful tools for such investigations. We introduce and study universal languages for Parikh matrices. We also obtain results concerning the inference from numbers to w, as well as from certain entries of a Parikh matrix to other entries.

- IOSPRESS, BIB
**Injectivity of the Parikh matrix mappings revisited**- Virgil Nicolae Șerbănuță and Traian Florin Serbanuta
, Volume 73(1-2), pp. 265-283. 2006**Fundamenta Informaticae**

*Abstract.*We deal with the notion of M-unambiguity [5] in connection with the Parikh matrix mapping introduced by Mateescu and others in [7]. M-unambiguity is studied both in terms of words and matrices and several sufficient criteria for M-unambiguity are provided in both cases, nontrivially generalizing the criteria based on the -property introduced by Salomaa in [15]. Also, the notion of M-unambiguity with respect to a word is defined in connection with the extended Parikh matrix morphism [16] and some of the M-unambiguity criteria are lifted from the classical setting to the extended one.

This paper is an revised and extended version of [17].

- PDF, IOSPRESS, BIB
**Connections between subwords and certain matrix mappings**- Arto Salomaa
340(1): 188-203 (2005)**J. of TCS**

*Abstract.*Parikh matrices recently introduced have turned out to be a powerful tool in the arithmetizing of the theory of words. In particular, many inequalities between (scattered) subword occurrences have been obtained as consequences of the properties of the matrices. This paper continues the investigation of Parikh matrices and subword occurrences. In particular, we study certain inequalities, as well as information about subword occurrences sufficient to determine the whole word uniquely. Some algebraic considerations, facts about forbidden subwords, as well as some open problems are also included.

- DOI, BIB
**On the Injectivity of Parikh Matrix Mappings**- Arto Salomaa
64(1-4): 391-404 (2005)**Fundamenta Informaticae**

*Abstract.*We investigate the number of (scattered) subword occurrences and Parikh matrices, especially the case where the matrix determines the word uniquely. A condition introduced in this paper, called γ-property, turns out to be a powerful tool for such unambiguous matrices. Interconnections with the general theory of subword histories are also pointed out.

- IOSPRESS, BIB
**On Some Problems of Mateescu Concerning Subword Occurrences**- Cunsheng Ding and Arto Salomaa
TUCS-TR-701, August 2005**Technical Report**

*Abstract.*The paper investigates inference based on quantities $\sca{u}$, the number of occurrences of a word $u$ as a scattered subword of $w$. Parikh matrices recently introduced are useful tools for such investigations. We introduce and study universal languages for Parikh matrices. We also obtain results concerning the inference from numbers $\sca{u}$ to $w$, as well as from certain entries of a Parikh matrix to other entries.

- Tech. Report's page, BIB
**On Languages Defined by Numerical Parameters**- Arto Salomaa
TUCS-TR-663. 2005**Technical Report**

*Abstract.*The paper investigates definitions of languages based on Boolean combinations of equations dealing with the number of subword occurrences. Complete characterizations are obtained for a certain class of subwords (*a-separated*). Interconnections to the theory of Parikh-matrices, as well as applications of some known results, are also studied.

- Tech. Report's page, BIB
**Some characterizations of Parikh matrix equivalent binary words**- S. Fosse and G. Richomme
92(2):77-82 (2004)**Information Processing Letters**

*Abstract.*We state different characterizations of pair of words having the same Parikh matrix. Keywords: Formal languages; Combinatorics on words; Subwords; Parikh matrix; Palindromes

- DOI, BIB
**Algebraic Aspects of Parikh Matrices**- Alexandru Mateescu
, LNCS 3113: 170-180 (2004)**Theory Is Forever**

*Abstract.*This paper contains algebraic aspects of Parikh matrices. We present new, but also some old results, concerning this topic. It is proved that in some cases the set of Parikh matrices is a noncommutative semiring with a unit element. Also we prove that the set of Parikh matrices is closed under the operation of shuffle on trajectories and thus it is closed under many other operations. It is presented also the notion of extended Parikh matrix that it is an extension of the notion of the Parikh matrix. The paper contains also a number of open problems.

- LNCS, BIB
**Matrix Indicators For Subword Occurrences And Ambiguity**- Alexandru Mateescu and Arto Salomaa
15(2): 277-292 (2004)**IJFCS**

*Abstract.*The paper investigates inequalities between the numbers of different (scattered) subword occurrences. The Parikh matrix recently introduced is an efficient tool. We give various characterizations for Parikh matrices. Of special interest is the case where the matrix determines the word uniquely. We investigate such matrix unambiguous words. The considerations are extended to concern languages. Several open problems and problem areas are indicated.

- IJFCS, Technical Report, BIB
**Subword histories and Parikh matrices**- Alexandru Mateescu and Arto Salomaa and Sheng Yu
68(1): 1-21 (2004)**JCSS**

*Abstract.*Parikh matrices recently introduced give much more information about a word than just the number of occurrences of each letter. In this paper we introduce the closely related notion of a subword history and obtain a sequence of general results: elimination of products, decidability of equivalence, and normal form. We also investigate overall methods for proving the validity of such results. A general inequality of "Cauchy type" for subword occurrences is established.

- JCSS, BIB
**Extending Parikh Matrices**- Traian Florin Serbanuta
, Volume 310(1), pp 233-246. 2004**J. of TCS**

*Abstract.*We introduce the notion of Parikh matrix induced by a word, a natural extension to the notion of Parikh matrix and prove a set of properties for this kind of matrices.We also study the relation between these two notions. We show that combining properties from both we obtain a more powerful tool for proving algebraic properties of words.

- PDF, J.TCS, DBLP, BIB
**Counting (scattered) Subwords**- Arto Salomaa
81: 165-179 (2003)**B. of EATCS**

*Abstract.*An important numerical parameter associated to a word w is the number of occurrences of another word u as a (scattered) subword of w. The knowledge of sufficiently many such numbers characterizes the original word completely. The paper investigates methods of counting such numbers, both combinatorial and those based on Parikh matrices. The main results concern certain characteristic inequalities.

- BIB
**On the Injectivity of the Parikh Matrix Mapping**- Adrian Atanasiu, Carlos Martin-Vide and Alexandru Mateescu
49(4): 289-299 (2002)**Fundamenta Informaticae**

*Abstract.*In this paper we investigate the injectivity of the Parikh matrix mapping. This research is done mainly on the binary alphabet. We identify a family of binary words, refered to as "palindromic amiable", such that two such words are palindromic amiable if and only if they have the same image by the Parikh matrix mapping. Some other related problems are discussed, too.

- BIB
**Codifiable languages and the Parikh matrix mapping**- Adrian Atanasiu, Carlos Martin-Vide and Alexandru Mateescu
7(9): 783–793 (2001)**JUCS**

*Abstract.*

- JUCS, BIB
**A sharpening of the Parikh mapping**- Alexandru Mateescu, Arto Salomaa, Kai Salomaa and Sheng Yu
35(6): 551-564 (2001)**ITA**

*Abstract.*In this paper we introduce a sharpening of the Parikh mapping and investigate its basic properties. The new mapping is based on square matrices of a certain form. The classical Parikh vector appears in such a matrix as the second diagonal. However, the matrix product gives more information about a word than the Parikh vector. We characterize the matrix products and establish also an interesting interconnection between mirror images of words and inverses of.

- ITA, BIB
**On an Extension of the Parikh Mapping**- Alexandru Mateescu, Arto Salomaa, Kai Salomaa and Sheng Yu
TUCS-TR-364, September 2000**Technical Report**

*Abstract.*In this paper we introduce an extension of the Parikh mapping and investigate some of its basic properties. The extension is based on square matrices of a certain form. The classical Parikh vector appears in such a matrix as the second diagonal. However, the matrix product gives more information about a word than the Parikh vector. We characterize the matrix products and establish also an interesting interconnection between mirror images of words and inverses of matrices.