lexyeevee

troublesome fox girl

hello i like to make video games and stuff and also have a good time on the computer. look @ my pinned for some of the video games and things. sometimes i am horny on @squishfox



i wanted to scrape a few pages on doomworld but this became An Adventure because it turns out

(a) the entire page layout from some years ago is just fucking tables in tables in tables so it's a massive pain in the ass to identify anything

(b) a bunch of prose seems to have been written by a wysiwyg thing so even though the formatting looks the same across several pages, it is a fucking nightmare to wade through. sometimes a thing is <a href="..."><font size="+1"><b> but sometimes the same thing is <b><font size="+1"><a href="..."> because of course there are no heading elements, that's too hard. there are spurious wrapper <div align="center"> and <div align="left"> and if they're around an <hr> then it's basically random which one you get. just when i thought i'd nailed something down i found <p><b></b><strong>Title</strong></p>, as in, there was an empty <b> immediately followed by a <strong> containing the actual text. what the fuck is going on here

i think i could've just manually collected what i wanted in the time i've spent on this to crawl my way to like 30% success



ok so i'm in the market for a data file and i see python has a tomllib now. i've never actually used toml for anything but i thought i'd give it a whirl, why not.

first observation: you can't make, like, verbose lists. like [foo] is always a dict key, never a list entry. well that kind of sucks but ok. disregard i suck cocks, [[foo]] headers make lists, thank u @porglezomp

but then i see tomllib.loads takes a str, whereas tomllib.load takes a file specifically opened in binary mode. huh what

i check the stdlib source and

def load(fp: BinaryIO, /, *, parse_float: ParseFloat = float) -> dict[str, Any]:
    """Parse TOML from a binary file object."""
    b = fp.read()
    try:
        s = b.decode()
    except AttributeError:
        raise TypeError(
            "File must be opened in binary mode, e.g. use `open('foo.toml', 'rb')`"
        ) from None
    return loads(s, parse_float=parse_float)

????????????

who wrote this like this. WHy is this lik ethis


edit: omg the spec even LITERALLY SAYS

A TOML file must be a valid UTF-8 encoded Unicode document.

so why is the stdlib using the default encoding?? ?? ? ???? ? ?


wait it's not, bytes.decode() defaults to utf-8? that is news to me. ok so the whole idea is to enforce utf8 on a toml file. Well then nevermind this entire goddamn post