I’m still struggling to find the book I want around data science. I’ve learned that there are two levels:

*KNOWING* data science

*DOING* data science

This book is about the second one. Make no mistake, this is a “statistical computation” manual. This shows you how to find statistical answers using Python. *Fully half this book is code samples.* If you do not plan to actually attempt to find statistical answers to known questions by writing Python code, then this isn’t the book for you.

I would look at the code samples in this book and think, “What am I supposed to do with this? I’ll take the author’s word for it that this works, but what is it supposed to tell me?” The code samples don’t even show much inner computation, since most of the work is rolled up into Python libraries, and the code samples really just show magical method calls and the code *around* those. This is damn-near a Python manual.

And I disagree with the title: “…from Scratch.” It’s not from scratch, and this is my major complaint: knowing how to find the answer is the second half of the process. The major problem is this: *no one knows the right questions.* I can find or hire someone to give me the answer. Explain to me what questions I should be asking of data.

And this is where the book falls down. The scenarios described are enormously contrived, and they’re glossed over in a mad rush to get to the code samples (the very part I didn’t care about). I want more time spent on why the question matters. Real world examples would be nice too.

I get that this might not be what the author was going for. But I fault him for the title. It’s not “Data Science from Scratch.” It’s, “How To Compute Statistics with Python.” I guess I should have paid more attention to the subtitle.

So, this is my problem with this book, and with about every book I’ve read on data science in the last two years. All these books are written by statisticians who are very quick to show you math (or code). I want a book written by a business person that starts with the idea of what solutions we can unlock from our data.

How to find those solutions? That’s a readily solvable problem.