![]() ![]() You are distributing (and replicating) your large dataset in small fixed chunks over many nodes. In fact, Spark is versatile enough to work with other file systems than Hadoop - like Amazon S3 or Databricks (DBFS).īut the idea is always the same. This presents new concepts like nodes, lazy evaluation, and the transformation-action (or ‘map and reduce’) paradigm of programming. Instead, it is a framework working on top of HDFS. Remember, Spark is not a new programming language that you have to learn. You could also run one on an Amazon EC2 if you want more storage and memory. However, if you are proficient in Python/Jupyter and machine learning tasks, it makes perfect sense to start by spinning up a single cluster on your local machine. The above options cost money just to even start learning (Amazon EMR is not included in the one-year Free Tier program unlike EC2 or S3 instances).
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |