0

I'm trying to count all the letters from a text but I want to exclude the counting of a letter next to a space

from collections import Counter

text = 'in the seas and let fowl multiply in the earth \nand the evening and the mo were the fifth day \nand gd slet the earth bring forth the livngke man in our image after our likeness and let d over all the'
txt = [x for x in text if x not in '\n'] #this what i did for excluding the newline 

cc = Counter(zip(text, text[1:])).items() 
print(cc)

I want to remove all the pair with space - like this (('n', ' '), 4)

From this

dict_items([(('i', 'n'), 5), (('n', ' '), 4), ((' ', 't'), 8), (('t', 'h'), 12), (('h', 'e'), 8), (('e', ' '), 10), ((' ', 's'), 2), (('s', 'e'), 1), (('e', 'a'), 3), (('a', 's'), 1), (('s', ' '), 2), ((' ', 'a'), 5), (('a', 'n'), 6), (('n', 'd'), 5), (('d', ' '), 7), ((' ', 'l'), 4), (('l', 'e'), 3), .....)

to this

dict_items([(('i', 'n'), 5), (('t', 'h'), 12), (('h', 'e'), 8), (('s', 'e'), 1), (('e', 'a'), 3), (('a', 's'), 1), (('a', 'n'), 6), (('n', 'd'), 5), (('l', 'e'), 3), .....)

New contributor
PPNGA is a new contributor to this site. Take care in asking for clarification, commenting, and answering. Check out our Code of Conduct.
4
  • Can you please provide an minimal example with expected output. 10 hours ago
  • Added the output i wanted, if i do remove the space like what i did with \n it will just become a whole block of text without spaces. Are there any other methods?
    – PPNGA
    10 hours ago
  • So, you want all pairs of characters from your text, as long as the pair doesn't include a space? Why do you need the pairs to count characters? Why not simply subtract the number of spaces from the length of the string?
    – Grismar
    10 hours ago
  • I'm creating a markov chain transition matrix from the text based on the frequency of letters. I don't want to count the transition from a letter to a space or space to letter. And when i remove the spaces the last word last letter and the next word first letter is going to be counted, which i don't want to happen.
    – PPNGA
    9 hours ago

1 Answer 1

0

The number of non-whitespace characters in a string:

import re

text = 'in the seas and let fowl multiply in the earth \nand the evening and the mo were the fifth day \nand gd slet the earth bring forth the livngke man in our image after our likeness and let d over all the'

no_space = re.sub('\s', '', text)
print(len(no_space))

Result:

156

If you need those pairs for something else:

pairs = list(zip(no_space, no_space[1:]))
print(pairs)

Result:

[('i', 'n'), ('n', 't'), ('t', 'h'), ('h', 'e'), ('e', 's'), ('s', 'e'), ...]

I'm using re because this covers all whitespace by removing everything that matches \s (including newlines, tabs, spaces, etc.). If you don't want to use re for some reason, and are only interested in ignoring regular spaces and newlines:

no_space = ''.join(c for c in text if c not in ' \n')

Your Answer

PPNGA is a new contributor. Be nice, and check out our Code of Conduct.

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.