Wrote a super clunky Python script to do story wordcounts

krm2116 · Aug 22, 2019

Code:

from bs4 import BeautifulSoup
import bs4
from urllib.request import urlopen
from collections import defaultdict

txt_dict = defaultdict(list)

def extract_text(link, level=0, MAX_LEVEL=1000):
   
    if level == MAX_LEVEL:
        return
           
    with urlopen(link) as response:
        soup = BeautifulSoup(response, 'html.parser')
       
    title = soup.title.contents[0]
   
    txt = ''
    for c in soup.find_all('div', {'class': "chapter-content"}):
        for x in c.strings:
            txt += x + "\n"
           
    print(level, title)
    if title in txt_dict.keys():
        for _t in txt_dict[title]:
            if _t == txt:
                print ("\tDUPLICATE")
                return
   
    txt_dict[title].append(txt)
   
    wordcount = len(txt.split())
    print("\t", wordcount)
    children = []
    for c in soup.find_all('div', {'class': "question-content"}):
        for x in c.find_all('a'):
            link = x.get('href', '/')
            if link != 'https://chyoa.com/auth/login':
                children.append(extract_text(link, level+1))
    return {'title': title, 'text': txt, 'link': link, 'wordcount': wordcount, 'children': children}
def sum_wordcounts(results):
    if results is None:
        return 0
    return results['wordcount'] + sum([sum_wordcounts(r) for r in results['children']])

Run as follows (on root link of your story):

Code:

results = extract_text('https://chyoa.com/story/Vampire-Newborn.20536')

Which figures out the chapters, and tree structure of story

Code:

print("wordcount = ", sum_wordcounts(results))

cmc · Aug 22, 2019

Funny, wrote a quickie python script to track hits by hour. I was curious when the story was read, and found between 7-10 pst is my prime time. I take a snapshot the side panel and just throw it into a file, could be far more sophisticated, but I was not wanting to spend much time.

krm2116 · Aug 22, 2019

If anyone is interested, happy do to a word count for their story.

Log in

Wrote a super clunky Python script to do story wordcounts

krm2116 Virgin

cmc Virgin

krm2116 Virgin

Log in

Wrote a super clunky Python script to do story wordcounts

krm2116 Virgin

cmc Virgin

krm2116 Virgin

Useful Searches